Am I allowed to scrape my own stats page?

Message boards : Number crunching : Am I allowed to scrape my own stats page?
Message board moderation

To post messages, you must log in.

AuthorMessage
wujj123456

Send message
Joined: 5 Sep 04
Posts: 40
Credit: 20,877,975
RAC: 219
China
Message 2018322 - Posted: 9 Nov 2019, 21:56:55 UTC
Last modified: 9 Nov 2019, 22:18:22 UTC

I just noticed today that I am also hit by the nvidia driver issue in the pinned post. However, I didn't find out until I casually looked at my boinc manager. I usually just leave my computers crunching, not really checking every day. However, I'd like to know if some WU is failing or some application suddenly takes much longer to finish.

I've asked before for such a task API but it seems detailed stats API doesn't exist for SETI (https://setiathome.berkeley.edu/forum_thread.php?id=83683). I tried parsing logs, but it's kinda tedious to maintain such a cron across all my computers and aggregate them. Trying to extract time information from log is also not very accurate. Thus I abandoned it halfway.

Since I can just log in and view my tasks, could I scrapethat with a script? It won't be very often. I probably only need to do this once a few hours at most. I just want to check if this is against any policy since not all websites are OK with scrapers. I don't want to get myself accidentally banned.
ID: 2018322 · Report as offensive
Oddbjornik Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 220
Credit: 349,610,548
RAC: 1,728
Norway
Message 2018324 - Posted: 9 Nov 2019, 22:13:38 UTC - in response to Message 2018322.  

I just noticed today that I am also hit by the nvidia driver issue in the pinned post. However, I didn't find out until I casually looked at my boinc manager. I usually just leave my computers crunching, not really checking every day. However, I'd like to know if some WU is failing or some application suddenly takes much longer to finish.

I've asked before for such a task API but it seems detailed stats API doesn't exist for SETI (https://setiathome.berkeley.edu/forum_thread.php?id=83683). I tried parsing logs, but it's kinda tedious to maintain such a cron across all my computers and aggregate them. Trying to extract time information from log is also not very accurate. Thus I abandoned it halfway.

Since I can just log in and view my tasks, could I scrap that with a script? It won't be very often. I probably only need to do this once a few hours at most. I just want to check if this is against any policy since not all websites are OK with scrappers. I don't want to get myself accidentally banned.
It's getting late up here, so it took me a while to realise that you mean scrape, and not scrap.

Just a friendly heads up to any other sleepy heads who may be more competent than me to answer your question. No offense intended.
ID: 2018324 · Report as offensive
wujj123456

Send message
Joined: 5 Sep 04
Posts: 40
Credit: 20,877,975
RAC: 219
China
Message 2018326 - Posted: 9 Nov 2019, 22:19:02 UTC - in response to Message 2018324.  

I just noticed today that I am also hit by the nvidia driver issue in the pinned post. However, I didn't find out until I casually looked at my boinc manager. I usually just leave my computers crunching, not really checking every day. However, I'd like to know if some WU is failing or some application suddenly takes much longer to finish.

I've asked before for such a task API but it seems detailed stats API doesn't exist for SETI (https://setiathome.berkeley.edu/forum_thread.php?id=83683). I tried parsing logs, but it's kinda tedious to maintain such a cron across all my computers and aggregate them. Trying to extract time information from log is also not very accurate. Thus I abandoned it halfway.

Since I can just log in and view my tasks, could I scrap that with a script? It won't be very often. I probably only need to do this once a few hours at most. I just want to check if this is against any policy since not all websites are OK with scrappers. I don't want to get myself accidentally banned.
It's getting late up here, so it took me a while to realise that you mean scrape, and not scrap.

Just a friendly heads up to any other sleepy heads who may be more competent than me to answer your question. No offense intended.

Thank you for the correction! Edited my post to avoid more confusion.
ID: 2018326 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2018327 - Posted: 9 Nov 2019, 22:28:01 UTC - in response to Message 2018322.  

couple things,

1. yes you "can" (as in are able to) scrape your own tasks. I've done it plenty. It seems that the speed is limited by the page loads, so it takes a long time to scrape a lot of tasks. figure 1-2 seconds per scrape, since all task info is on different pages and you need to open each page individually to get the data. this inherently reduces the load impact of scraping since it's forced to go so slow anyway. if you aren't doing it a lot, I highly doubt anyone would ever notice.

2. any tasks viewed via the website are already completed and returned tasks. if your goal is to intervene in some task that's hung, scraping will not help you.

3. I see you have at least one system on Linux. if these are only crunchers, and you otherwise don't NEED Windows for anything in particular, you might consider switching your windows systems to Linux and running a different app, we have a "special" linux-only CUDA 9/10 app that not only doesn't have the issues described in the sticky post, but is also about 3-4x faster.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2018327 · Report as offensive
wujj123456

Send message
Joined: 5 Sep 04
Posts: 40
Credit: 20,877,975
RAC: 219
China
Message 2018333 - Posted: 9 Nov 2019, 23:10:05 UTC - in response to Message 2018327.  
Last modified: 9 Nov 2019, 23:13:11 UTC

couple things,

1. yes you "can" (as in are able to) scrape your own tasks. I've done it plenty. It seems that the speed is limited by the page loads, so it takes a long time to scrape a lot of tasks. figure 1-2 seconds per scrape, since all task info is on different pages and you need to open each page individually to get the data. this inherently reduces the load impact of scraping since it's forced to go so slow anyway. if you aren't doing it a lot, I highly doubt anyone would ever notice.

2. any tasks viewed via the website are already completed and returned tasks. if your goal is to intervene in some task that's hung, scraping will not help you.

3. I see you have at least one system on Linux. if these are only crunchers, and you otherwise don't NEED Windows for anything in particular, you might consider switching your windows systems to Linux and running a different app, we have a "special" linux-only CUDA 9/10 app that not only doesn't have the issues described in the sticky post, but is also about 3-4x faster.

Thanks! Yeah, I felt no one would notice. Just want to err on the safe side in case this breaks any policy. I don't intend to intervene stuck tasks. I just want to alert myself if lots of tasks start to error out or timeout. It's OK to suffer some loss of work for a day or two, but I don't want it to last for weeks.

I crunch on all of my computers for various projects. Unfortunately both my Windows rigs are gaming rigs though I only intensively game on one, which has the latest/faulty driver. To be honest, I won't use Windows unless I am forced to, like for some games. Dual-booting and switching between them proved to be too much hassle for me. The Linux server doesn't have any GPU installed for now. I did see someone mentioning the optimized Linux app as well when searching in the forum. I wasn't able to find out the link. Do you have the thread to the application? I might add a GPU to my server if I come across a good deal this holiday, and I sure want to run the most efficient app.
ID: 2018333 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2018352 - Posted: 10 Nov 2019, 3:02:54 UTC - in response to Message 2018333.  

I wasn't able to find out the link. Do you have the thread to the application? I might add a GPU to my server if I come across a good deal this holiday, and I sure want to run the most efficient app.


The app is here:
http://www.arkayn.us/lunatics/BOINC.7z

and the discussion of it is in this thread.
Setting up Linux to crunch CUDA90 and above for Windows users
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2018352 · Report as offensive
wujj123456

Send message
Joined: 5 Sep 04
Posts: 40
Credit: 20,877,975
RAC: 219
China
Message 2018459 - Posted: 11 Nov 2019, 1:20:58 UTC - in response to Message 2018352.  

I wasn't able to find out the link. Do you have the thread to the application? I might add a GPU to my server if I come across a good deal this holiday, and I sure want to run the most efficient app.


The app is here:
http://www.arkayn.us/lunatics/BOINC.7z

and the discussion of it is in this thread.
Setting up Linux to crunch CUDA90 and above for Windows users

I dusted off my diskless PXE boot just to give it a shot this weekend and that speed is... quite tempting. I also run some other long running projects that rebooting between two OS is going to throw away lots of work. I have to keep at least one of my computer on Windows all the time, but probably switching the other one with GPU over to Linux.

Just curious, where are these optimized builds usually announced? I think I probably should start tracking them myself instead of bothering others. It doesn't really make sense to burn same power and do <1/5 of work... I used your link to search again but all I found were ad hoc threads like this.
ID: 2018459 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2018460 - Posted: 11 Nov 2019, 1:37:58 UTC - in response to Message 2018459.  

Unfortunately due to the nature of these forums, it’s impossible to keep 1 thread updated. Topics can be “stickied”, but posts can’t be edited after 1 hour from posting. So you can’t have the first post continually updated like you would normally see in other forums.

That results in new threads for everything, and info being buried in massive topics with 1000s of posts, or dispersed through lots of separate topics.

It’s the worst aspect of these forums. But I doubt it’s going to change anytime soon.

Luckily, that link that Keith posted is typically up to date, and new packages get uploaded to the same exact URL, so as long as you use that link it’s most likely the most up to date version.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2018460 · Report as offensive

Message boards : Number crunching : Am I allowed to scrape my own stats page?


 
©2025 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.