- 63 Posts
- 180 Comments
InternetPirate@lemmy.fmhy.mlOPMtoShareGPT@lemmy.fmhy.ml•How to download a website for offline browsing using wget?
1·3 years agowhat are the best linux cli tools to download a mirror of a website for offline browsing?
InternetPirate@lemmy.fmhy.mlOPMtoShareGPT@lemmy.fmhy.ml•How to download a website for offline browsing using wget?
2·3 years agowget -mkEpnp
wget --mirror --convert-links --adjust-extension --page-requisites –no-parent http://example.org
Explanation of the various flags:
--mirror – Makes (among other things) the download recursive. --convert-links – convert all the links (also to stuff like CSS stylesheets) to relative, so it will be suitable for offline viewing. --adjust-extension – Adds suitable extensions to filenames (html or css) depending on their content-type. --page-requisites – Download things like CSS style-sheets and images required to properly display the page offline. --no-parent – When recursing do not ascend to the parent directory. It useful for restricting the download to only a portion of the site.wget -mpHkKEb -t 1 -e robots=off -U ‘Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:40.0) Gecko/20100101 Firefox/40.0’ http://www.example.com
–m (--mirror) : turn on options suitable for mirroring (infinite recursive download and timestamps). -p (--page-requisites) : download all files that are necessary to properly display a given HTML page. This includes such things as inlined images, sounds, and referenced stylesheets. -H (--span-hosts): enable spanning across hosts when doing recursive retrieving. –k (--convert-links) : after the download, convert the links in document for local viewing. -K (--backup-converted) : when converting a file, back up the original version with a .orig suffix. Affects the behavior of -N. -E (--adjust-extension) : add the proper extension to the end of the file. -b (--background) : go to background immediately after startup. If no output file is specified via the -o, output is redirected to wget-log. -e (--execute) : execute command (robots=off). -t number (--tries=number) : set number of tries to number. -U (--user-agent) : identify as agent-string to the HTTP server. Some servers may ban you permanently for recursively download if you send the default User Agent.Cronjobs
0 23 * * * cd ~/Documents/Webs/mirror; wget -mpk -t 1 -e robots=off -U ‘Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:40.0) Gecko/20100101 Firefox/40.0’ https://example.com
0 23 * * * cd ~/Documents/Webs/mirror; wget -mpk -t 1 -e robots=off -U ‘Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:40.0) Gecko/20100101 Firefox/40.0’ https://example.com
0 23 * * * cd ~/Documents/Webs/mirror; wget -mpk t 1 -e robots=off -U ‘Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:40.0) Gecko/20100101 Firefox/40.0’ https://example.com
0 23 * * * cd ~/Documents/Webs/mirror; wget -mpkH t 1 -e robots=off -U ‘Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:40.0) Gecko/20100101 Firefox/40.0’ -D https://example.com
0 23 * * * cd ~/Documents/Webs/mirror; wget -mpk t 1 -e robots=off -U ‘Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:40.0) Gecko/20100101 Firefox/40.0’ https://example.com
0 23 * * * cd ~/Documents/Webs/mirror; wget -mpk t 1 -e robots=off -U ‘Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:40.0) Gecko/20100101 Firefox/40.0’ https://example.com
0 23 * 1 * cd ~/Documents/Webs/mirror; wget -mpk t 1 -e robots=off -U ‘Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:40.0) Gecko/20100101 Firefox/40.0’ https://example.com
0 8 * * * pkill wget; cd ~/Documents/Webs/mirror/; rm wget*
InternetPirate@lemmy.fmhy.mlOPMtoShareGPT@lemmy.fmhy.ml•limit python requests to 1.39 requests per second
1·3 years agoIt isn’t chatgpt. It’s an LLM with search
InternetPirate@lemmy.fmhy.mlOPMtoShareGPT@lemmy.fmhy.ml•limit python requests to 1.39 requests per second
1·3 years agoTo achieve a rate of 1.39 requests per second, you can use a global variable to keep track of the time elapsed between requests and then calculate the delay based on that time. Here’s a modified version of your
fetch_github_datafunction that implements this:import time import requests import logging PERSONAL_ACCESS_TOKEN = "your_personal_access_token" DELAY = 1 / 1.39 # Calculate the delay for 1.39 requests per second last_request_time = 0 # Initialize a global variable to store the last request time def fetch_github_data(url): global last_request_time # Access the global variable try: headers = { "Accept": "application/vnd.github+json", "Authorization": f"Bearer {PERSONAL_ACCESS_TOKEN}", "X-GitHub-Api-Version": "2022-11-28", } # Calculate the time elapsed since the last request time_elapsed = time.time() - last_request_time # Calculate the required delay based on the time elapsed required_delay = max(0, DELAY - time_elapsed) # Sleep for the required delay time.sleep(required_delay) response = requests.get(url, headers=headers) # Update the last request time last_request_time = time.time() logging.info(f"Fetched data from {url}") return response.json() except requests.exceptions.RequestException as e: logging.exception(f"Error fetching data from {url}\n{e}") raiseThis code calculates the required delay based on the desired rate of 1.39 requests per second and the time elapsed since the last request. It then sleeps for the required delay before making the next request. The global variable
last_request_timeis used to keep track of the time of the last request.Citations: [1] https://www.geeksforgeeks.org/how-to-add-time-delay-in-python/ [2] https://stackoverflow.com/questions/66229987/calculate-attempts-per-second [3] https://pypi.org/project/requests-ratelimiter/ [4] https://akshayranganath.github.io/Rate-Limiting-With-Python/ [5] https://stackoverflow.com/questions/32815451/are-global-variables-thread-safe-in-flask-how-do-i-share-data-between-requests [6] https://stackoverflow.com/questions/44014718/python-request-get-after-few-seconds [7] https://realpython.com/python-sleep/ [8] https://algotrading101.com/learn/yahoo-finance-api-guide/ [9] https://stackoverflow.com/questions/26098711/limiting-number-of-http-requests-per-second-on-python [10] https://realpython.com/python-use-global-variable-in-function/ [11] https://scrapeops.io/python-scrapy-playbook/scrapy-delay-between-requests/ [12] https://cloud.google.com/python/docs/reference/storage/1.44.0/client [13] https://github.com/JWCook/requests-ratelimiter [14] https://discuss.python.org/t/global-variables-shared-across-modules/16833 [15] https://coderslegacy.com/python/delay-between-requests-in-scrapy/ [16] https://jrnold.github.io/r4ds-exercise-solutions/transform.html [17] https://levelup.gitconnected.com/implement-rate-limiting-in-python-d4f86b09259f [18] https://docs.python.org/3/faq/programming.html [19] https://www.javatpoint.com/how-to-add-time-delay-in-python [20] https://koji.mbox.centos.org/koji/buildinfo?buildID=22406 [21] https://pypi.org/project/ratelimit/ [22] https://docs.python.org/3/library/timeit.html [23] https://www.purplefrogsystems.com/2020/07/how-to-delay-a-python-loop/ [24] https://medium.com/clover-platform-blog/conquering-api-rate-limiting-dcac5552714d [25] https://learning.postman.com/docs/writing-scripts/pre-request-scripts/ [26] https://python-forum.io/thread-35631.html [27] https://365datascience.com/tutorials/python-tutorials/limit-rate-requests-web-scraping/ [28] https://instructobit.com/tutorial/108/How-to-share-global-variables-between-files-in-Python [29] https://learn.microsoft.com/en-us/azure/storage/blobs/storage-blobs-latency [30] https://dev.to/astagi/rate-limiting-using-python-and-redis-58gk [31] https://www.simplilearn.com/tutorials/python-tutorial/global-variable-in-python
InternetPirate@lemmy.fmhy.mlto
Python@programming.dev•[Solved] How would you debug this script without creating many posts?
9·3 years agoJust change
lemmy.post.createtolemmy.post.createeto trigger an AttributeError. That way you can debug the code without creating any posts. You can also use many print statements all around the code, I would use two for each line to make sure the computer isn’t fooling you. Lastly, you can spin up your own Lemmy instance to not have to worry about the generated posts.
InternetPirate@lemmy.fmhy.mlOPtoLemmy Support@lemmy.ml•Does commenting on your own post bump it on the active filter view in Lemmy?
2·3 years agoTesting.
https://join-lemmy.org/docs/users/03-votes-and-ranking.html
Edit: I was wrong the ranking that works like forums is New Comments and yes it seems to take into account the OP comments.
InternetPirate@lemmy.fmhy.mlOPto
No Stupid Questions@lemmy.world•Where can I seek medical advice online?
12·3 years agoYou don’t have any idea of how GPT works. Read about it and then we can talk.
InternetPirate@lemmy.fmhy.mlOPto
No Stupid Questions@lemmy.world•Where can I seek medical advice online?
23·3 years agoComparing current LLMs with autocomplete is stupid. An autocomplete can’t pass law or biology exams in the 90th percentile like GTP-4 can.
InternetPirate@lemmy.fmhy.mlOPto
Asklemmy@lemmy.ml•What would you do if you had access to a superintelligent AGI?
3·3 years agoHopefully there are some people more positive than that, willing to change society so AGI doesn’t make most humans starve to death or be imprisoned.
I feel like this is what happened when you’d see posts with hundreds / thousands of upvotes but had only 20-ish comments.
Nah it’s the same here in Lemmy. It’s because the algorithm only accounts for votes and not for user engagement.
InternetPirate@lemmy.fmhy.mlto
You Should Know@lemmy.world•YSK: Sorting by 'Hot' instead of 'Active' will show more diverse content on Lemmy
24·3 years agoYou can’t claim it’s different either, so? I’ll still claim whatever the fuck I want.
InternetPirate@lemmy.fmhy.mlto
You Should Know@lemmy.world•YSK: Sorting by 'Hot' instead of 'Active' will show more diverse content on Lemmy
2·3 years agoYou can’t have a source to Reddit’s proprietary algorithms lol. Ask u/spez.
InternetPirate@lemmy.fmhy.mlto
You Should Know@lemmy.world•YSK: Sorting by 'Hot' instead of 'Active' will show more diverse content on Lemmy
3·3 years agoTop Hour is the same as Reddit’s rising option.
InternetPirate@lemmy.fmhy.mlOPto
Asklemmy@lemmy.ml•What would you do if you had access to a superintelligent AGI?
3·3 years agoI was thinking about this a few days ago. GANs and the Simulation Hypothesis: An AI Perspective
InternetPirate@lemmy.fmhy.mlOPto
Asklemmy@lemmy.ml•What would you do if you had access to a superintelligent AGI?
2·3 years agoLocked in a room with an internet connection? A lot. But without any contact with the outside world? Not nearly as much. It could have other people running experiments for it with an internet connection, but not without one.
Anyway, whether or not the AGI can interact with the real world undermines the purpose of my explicit statement in the question. I specifically mentioned that it only operates as a human on a computer. I didn’t mention it could acquire a physical body, so let’s just assume it can’t and can’t use other people to do physical labor either.
InternetPirate@lemmy.fmhy.mlOPto
Asklemmy@lemmy.ml•What would you do if you had access to a superintelligent AGI?
2·3 years agoI heard disruptive science is slowing down which I think means pretty much everything possible has already been thought of. So talking about things that exist, do you mean a cheaper solar panel or wind/water turbine? Or are we talking about science fiction like an Arc Reactor?
InternetPirate@lemmy.fmhy.mlOPto
Asklemmy@lemmy.ml•What would you do if you had access to a superintelligent AGI?English
2·3 years agoThis sounds like science fiction. Even if the AGI were capable of creating plans for a fusion reactor, for example, you would still need to execute those plans. So, what’s the point of everyone having access to the plans if the same electrical companies will likely be responsible for constructing the reactor?
InternetPirate@lemmy.fmhy.mlOPto
Asklemmy@lemmy.ml•What would you do if you had access to a superintelligent AGI?
4·3 years agoI honestly think that with an interesting personality, most people would drastically reduce their Internet usage in favor of interacting with the AGI. It would be cool if you could set the percentage of humor and other traits, similar to the way it’s done with TAR in the movie Interstellar.
















There are several Linux command-line tools that can be used to download a mirror of a website for offline browsing. Here are some of the most popular ones:
wget:
wgetis a free utility for non-interactive download of files from the Web. It supports HTTP, HTTPS, and FTP protocols, as well as retrieval through HTTP proxies. To download a website recursively, use the-roption. For example, the following command will download the entire website located athttp://example.com:wget --mirror --convert-links --adjust-extension --page-requisites --no-parent http://example.comThis will create a local copy of the website in a directory named
example.com.httrack:
httrackis a free and open-source website copier that allows you to download a website and browse it offline. It supports HTTP, HTTPS, and FTP protocols, as well as retrieval through HTTP proxies. To download a website, use the-Ooption followed by the directory where you want to save the website. For example, the following command will download the entire website located athttp://example.com:This will create a local copy of the website in the directory
/path/to/save/directory/example.com.curl:
curlis a command-line tool for transferring data from or to a server, using one of the supported protocols (HTTP, HTTPS, FTP, etc.). To download a website recursively, use the-roption. For example, the following command will download the entire website located athttp://example.com:curl -k -L -O -J -R -H 'Referer: http://example.com' -e 'http://example.com' -A 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36' -m 1800 --retry 3 --retry-delay 5 --retry-max-time 60 --retry-connrefused -s -S -L -r -k -J -O -C - http://example.comThis will create a local copy of the website in the current directory.
These tools have different options and features, so you should choose the one that best suits your needs.
Citations: [1] https://www.tutorialspoint.com/5-linux-command-line-based-tools-for-downloading-files-and-browsing-websites [2] https://www.kali.org/tools/httrack/ [3] https://medevel.com/os1-3-web-copier/ [4] http://www.linux-magazine.com/Online/Features/WebHTTrack-Website-Copier [5] https://winaero.com/make-offline-copy-of-a-site-with-wget-on-windows-and-linux/ [6] https://alvinalexander.com/linux-unix/how-to-make-offline-mirror-copy-website-with-wget