Radio and TV Player - Page 3 - vBulletin.org Forum:: 16 posts - Last post: Jan 22, 2008vBulletin Spiders Directory (download updated spiders_vbulletin.xml files and submit the spiders you know) http://www.vbulletin.org/forum/showthread.php?t=167567&page=3HOME | Heya,
I made a system where people can submit spiders and download updated spiders_vbulletin.xml files for their forum. After you submit a spider I must approve it for it to be included in the list.
Hope this helps everyone.
http://spiderlist.codeforgers.com
================================================
FYI, this is a nice mod made by Paul M. to keep track of your daily spiders visits:
http://www.vbulletin.org/forum/showthread.php?t=167278
Added spider Internet for learning.
Ah good, wasn't sure whether that was enough information. :) New to this spider identification.
Here are all reported spiders so far: http://www.vbulletin.com/forum/showthread.php?t=76662
Feel free to compare it against your list.
Thought of making a .htaccess version ( or simular) for Apache, Solaris, BSD, Linux, Windows, Mac. It should not be that hard to make a list of bad bots. :)
Here's one for you to add to the list:
Accoona
209.212.73.133
accoona-a133.client.pins.net
2 more guests and one of them is the Accoona again. Accoona keeps showing up as a guest.
server1932015481.serverpool.info
Mozilla/5.0 (compatible; http://www.whoisde.de/2.1; +http://www.whoisde.de)
accoona-a133.client.pins.net
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1
I've been trying to 'watch' my guest ips lately to add some to your list, but I really don't get too many other than yahoo, google, and msnbot (which I already have listed on my site).
This is excellent now I can see the acual names instead of a forum packed full of hungry guests.
Also: I submitted a spider he's called "Visions"
Did you submit the user agent? The "ident" field is not the IP, it's the user agent. You can get that on the who's online page.
This thread seems to be picking up a couple IM,ing feeds.. :)
Thank you Dream
I totally forgot about the spider list :D
This is really good
Thanks again ;)
Wow just updated my xml and caught 10 Yeti spiders already. I think they are from www.naver.com, but I can't be sure as the site is in chinese.
Yes I use that too. Were you using the latest XML?
I didn't want to add everything because some spiders don't even exist anymore.
I ask your help to submit the spiders you know exist to the system.
Added GurujiBot.
Thanks :)
A little question about livebot spider.
Livebot
Isn;t this is the computer with .NET Framework installed ?
Added another:
TinEye/1.1 (http://tineye.com/crawler.html)
I don't understand what you mean.
MR K, I'm not sure what's up, the list should have detected the bot with one of those two idents used... You said they were being detected before? Did you upload the xml to the correct folder?
yes, absolutely, it runs cause it detects the others bots like google or yahoo ... i put the file into the /includes/xml folder ...
updated
Thanks, added as eBingBong.
There's no way to detect it by IP address?
No one congratulated me for my leet captcha skills :(
Thanks Floris, I only not add all of those because I think most don't exist anymore. I could use some help checking the ones that do exist and registering them.
Thanks, approved both.
That's weird, omgili is already on the list.
Very needed, thank you.
_V
No Yahoo! Slurps are being detected? You should post this in the support forums.
:rolleyes:
The ident couldn't be used...
I think the help in the spider site has info on that.
Hello,
The last file "Thu 26th Jun 2008" don't work.
Any spiders is show. I think the file is not correct.
Thank you.
They show fine for me. Maybe you got a bad copy.
the spider was added
if you submit more, just put in the ident field the string that is unique to the spider, no need for those "Mozilla 4.0" etc, and I'm not sure but I think that can change. I'm gonna do a FAQ now and add this to it.
Hello,
The last file "Thu 26th Jun 2008" don't work.
Any spiders is show. I think the file is not correct.
Thank you.
I use spiders as random "foes" appearing on my forum. MSN as an Orc, Yahoo as a Goblin, etc. :cool:
http://www.comicguide.net/images/smilies/lolrot.gif
Three weeks and not one submission :(
And you probably won't get any until they see an update of some sort released. That seems to get people motivated.
Just to let you know why I like this project so much:
I show adsense adverts to guests only.
I have the hack that shows spiders separately on the home page.
By having the spiders accurately identified, I can get a very quick glance from the home page of the size of the audience viewing adverts at that moment.
I'm thinking of creating a vBulletin hack to add a scheduled task to fetch the spider list weekly to ensure it's never too far out of date. Not daily as it doesn't change that much and the traffic to your server might be excessive if it ended up a popular hack.
Twiceler is already on the list.
Excellent work, thank you! :)
You are welcome :)
Dream, the snapshots spiders aren't detected ... (48 hours are gone) i know this cause i perfectly know its IP ... :)
I added your Snapbot, see if you can detect them now :)
You are the man! Excellent job! ;)
Why is the xml file so much smaller than the other one?
I noticed that, because it doesn't have the spider type I think.
Removed, problem solved.
Added spider Najdi.si.
Thanks, added.
Sorry I didn't notice that was the IP.
There's one Acconna in the xml, but I never said it was duplicated, did I? Anyway, when you see it as a guest in your forums again, paste the User Agent for me ok? So I can fix it.
edit: oh yes, someone else submitted it for you actually.
I really dont care if Dream added it or not.... it's not a spider, period.
If he choses to add it, or not add it - despite the glaring evidence it is NOT a spider, then thats totally up to him.
A massive part of my job involves checking log files and I stand to be corrected only if you can post me url's proving this to be a spider.
As far as I am concerned, I've proved it isnt - if you wish to call black white, then please post the proof.
You ought to learn to read closer. The last spider I submited was not the FunWebProducts that you are referring to. That issue is done. The spider I posted, I posted the link for the proof.
Nope, I added it, thanks.
I noticed that, because it doesn't have the spider type I think.
Shouldn't the type be in there?
i've just found another spider that isn't recognized (from yahoo) ... check the attachment ...
Ok, I apologise.... I must have mis-interpreted the conversation.
Here are some regular bots on my boards:-
85.225.137.240
Mozilla/4.0 (BejiBot Crawler 1.2a)
88.131.106.7
Speedy Spider (http://www.entireweb.com/about/search_tech/speedy_spider/)
82.80.252.110
BoardTracker (http://www.boardtracker.com/spider.html) (Mozilla/4.0 compatible; MSIE 6.0; Linux Cent
61.247.217.36
Yeti/0.01 (nhn/1noon, yetibot@naver.com, check robots.txt daily and follow it)
209.11.177.198
Mozilla/4.0 (compatible; BOTW Spider; +http://botw.org)
142.166.3.122
R6_CommentReader(www.radian6.com/crawler (http://www.radian6.com/crawler))
thanks
added BejiBot, Bot W, Radian6 Comment Reader, Radian6 FeedFetcher and Yeti
the others were already there
1,000 downloads :)
Here are all reported spiders so far: http://www.vbulletin.com/forum/showthread.php?t=76662
Feel free to compare it against your list.
That one hasn't been updated for a while though. Is there anything newer?
Thanks :), I added it.
Exactly. I have no idea why it showed up as a Guest, but it did, sir. ;).
Here is a list I use.
http://www.botsvsbrowsers.com/category/1/index.html
It's the biggest I've seen so far.
First. is that fairly accurate? And second, wanna share the xml file for it? ;)
Another one not showing up ...
38.98.19.67
Mozilla/5.0 (SnapPreviewBot) Gecko/20061206 Firefox/1.5.0.9
yep
Cool, I had no idea there were this many bots out there. Awesome, thanks for making it. So do you just import it somehow into vbulletin?
A second Yeti bot was added.
I just submitted the ident string for thre Accoona spider that keeps showing up as a guest.
Added
Internet Research Institute UK
Scrubby
There has to be a way around this even if it means us coming up with a hack to do it. Are you game?
Sorry this goes beyond my interest in the problem, but if you get someone to do it I'll add IPs to the system.
Whatever you guys decide is good by me, let me know if I have to remove the spider.
I never used it before doing this project either.
I thought about making a mailing, I'll do it eventually. Just subscribe this thread for now.
not sure of the email or website, but i submitted this ...
livebot-65-55-165-84.search.live.com
livebot-65-55-165-114.search.live.com
65.55.165.84 (http://www.gangroomforum.com/online.php?do=resolveip&ipaddress=65.55.165.84) & 65.55.165.114 (http://www.gangroomforum.com/online.php?do=resolveip&ipaddress=65.55.165.84)
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1.4322)is this what your looking for? thanks for all the work you do with the spider list. its greatly appreciated. :)
Thanks, added.
Hi,
I'd like to add that a few spiders/bots are not being removed using your list together with the mod I described above. See the attached picture.
Maybe Yeti isn't being removed because it identifies itself as Yeti/1.0, see screenshot.
Cheers,
Gabriel.
Very nice! I'll be using this.
Heya,
I made a system where people can submit spiders and download updated spiders_vbulletin.xml files for their forum. After you submit a spider I must approve it for it to be included in the list.
Hope this helps everyone.
http://spiderlist.codeforgers.com
ok removed mrsputnik and added google mobile spider Ana Sayafada Arama Motorları Gözüksün (sprits) Yardım :: $spider = $vbulletin->wol_spiders['agents']["$agent"]; .. Upload\includes\xml \spiders_vbulletin.xml dosyasını forum/includes/xml/ klasörünüze upload et. http://www.hanemiz.com/ana-sayafada-arama-motorlari-gozuksun-sprits-yardim-t73176.htmlHOME |
I'm just the spider list janitor :P
Ok, I was using the old spiders XML file to find spiders on my forum to add to the system, but I realized I would never remove it for fear of missing a spider. So, I added all spiders from the old XML to the system, and we'll remove the ones that doesn't exist anymore.
So, now we have 400 spiders in the system, and I need a hug after all this work or I'll cry. :p
Ok, I was using the old spiders XML file to find spiders on my forum to add to the system, but I realized I would never remove it for fear of missing a spider. So, I added all spiders from the old XML to the system, and we'll remove the ones that doesn't exist anymore.
So, now we have 400 spiders in the system, and I need a hug after all this work or I'll cry. :p
You deserve a hug. give me a hug.
Regards
The ident or user agent is the ID of the spider, that shows in the who's online when you choose to show the user agent, for example
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
In this case, this is the Google spider with the ident Googlebot.
Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)
This is the Yahoo! spider with ident Yahoo! Slurp.
Spider TinEye added.
This is a good list, but again you would need to click though or do a internet search to make sure these spiders are still active. I would still like to be able to ban a spider, as some do just take your images.
http://www.user-agents.org/
just added another msnbot for the verification ...
That one is not MSNBOT, it's users who have .NET installed, it was discussed earlier in the thread.
Thanks, added.
Maybe, I don't know for sure.
You are right:
http://www.webmasterworld.com/forum11/2715.htm
.net clr 1.1.4322 will be present in any IE (90%+ of the market) that is on a machine with .net FrameWork installed. I think this framework became part of Service Pack 2, so its very widespread -- you will ban innocent people.
I removed Livebot from the list.
http://dream.epicfailed.us/ ?
Someone submitted yours I think, I thought it was you honestly.
You just look at the spider list and see if the spider is already there. If not you can submit it.
The list is updated whenever there are new spiders to approve.
Is there a way to send out a notice when it is updated by chance?
And I will keep an eye out for spiders and report them to you as I have some pesky ones show up every now and then.
I am not familair at all with spiderlists so please forgive my newbee questions; is this spiderlist suitable for all types of sites? Pool spiders on Yedda - People. Sharing. Knowledge.:: vBulletin Spiders Directory (for updated spiders_vbulletin.xml files). Heya, I made a system where people can submit spiders and download updated http://yedda.com/questions/Pool_spiders_1498151176856/HOME |
I suppose what it does is get your site spidered by the spiders that are listed in the .xml file? Wouldn't these spiders normally find your sites by themselves?
Here's a strange one:
38.104.58.118 (http://www.fathers-rights-forums.com/forums/online.php?do=resolveip&ipaddress=38.104.58.118)
panscient.com
(http://www.fathers-rights-forums.com/forums/online.php?do=resolveip&ipaddress=38.104.58.118)
The panscient.com was the full User Agent string, believe it or not.
Well then tell me so in the note, or you expect I discover "T312461" is an ident? Why didn't you submit only "T312461"?
If you don't like my work, then use something else. I don't have to put up with demands and angryness as I'm not being paid to do this. And you talk about my attitude.
You don't owe me anything, use whatever you like ;)
To get the User Agent, in the Who's Online page there's an option at the bottom "show user-agent: yes / no". It's that simple.
Of course I'm not aiming to have India website cloning bots Jose...
nothing it doesn't works ... however, Dream i submitted the 'new' agent of the snapbot with its IPs ... check out ... thx
I quoted one already and there are many more on the net. Use the search like I did. MRSPUTNIK is spammers, no matter how you try to justify it. Are you one of them?
Great to hear that :)
No problem. As I said, I need the User Agent of the spider, not the IP. This is a sample user agent:
Mozilla/5.0 (Windows; U; Windows NT 5.1; pt-BR; rv:1.8.1.12) Gecko/20080201 Firefox/2.0.0.12
I don't know how the old xml can get more spiders, as this one has all spiders the old one has. But it's your call and you may be right, if you find out why please tell me.
Also, the Omgili spider was added and the file updated.
Why does the Accoona one keep showing up as a guest now?
I'll, try your file again. I owe you that much.
How do I go about getting the user agent stuff? When I resolve it it doesn't come up with that stuff.
I thought it was already in there but it came in as a guest so I thought I should report it.
Here you go.
WebAlta Crawler/2.0 (http://www.webalta.net/ru/about_webmaster.html) (Windows; U; Windows NT 5.1; ru
Before you roll your eyes in that sarcastic tone, maybe you ought to check this out and learn how to use an IDENT properly.
http://www.botsvsbrowsers.com/details/46560/index.html
T312461 is what you need.
added Attributor and YandexBlog
rofl :)
Means he recognizes his fault and went: EPIC FAIL
Nice job though Dream, I am looking forward to an updated version.
I installed your list of spiders about a week or so ago and now I see all sorts of the things that I had never heard about before! "Long, thin, slimy ones; Short, fat, juicy ones, Itsy, bitsy, fuzzy wuzzy spiders." Ok, ok, that is supposed to be regarding worms, but I thought it was appropriate here. :)
is mail.ru a spam site or it's "ok"?mail.ru is one of biggest webmail services in Russia, just like hotmail.com or yahoo.com in US.
Here's one for you to add to the list:
Someone else submitted it, and I approved it.
if i see it, ill let you know. :)
Ok, added EnaBall spider.
Many thanks Dream!
It's hard to believe that the spiders only like my site. ;)
I don't know how you notice them all!
Will do.
Here is another one:
64.13.138.6 (http://www.fathers-rights-forums.com/forums/online.php?do=resolveip&ipaddress=64.13.138.6)
Mozilla/5.0 (compatible; ScoutJet; +http://www.scoutjet.com/)
Just submitted this one
Charlotte
Mozilla/5.0 (compatible; Charlotte/1.0b; http://www.searchme.com/support/) (2549df004ae664faef17dce174913cea
http://www.searchme.com/support/pages/spider.php
info@searchme.com
This is pretty much why I do not care for having this renewed, as you think you have good bots and not caring that some bots do harm for example; by using any images you have and then posting it up somewhere eles.
I was thinking more something like this, but it would seem vBulletin is moving away from using or adding an area to show names of banned spiders coming to your board.
http://www.vbulletin.com/forum/showpost.php?p=1496224&postcount=616
Thanks, I added it to the list.
New one not picked up.
*submitting*
c0c.entireweb.com
Speedy Spider (http://www.entireweb.com/about/search_tech/speedy_spider/)
Cool :) don't forget to submit some spiders if you know of any.
Dream could offer an spider.xml file for the FAQ for admins to import. And instructions on how to change a phrase to link from who's online to the faq explanation. Might be cool for some sites. (just brainstorming)
I might not be the best person to create a FAQ about spiders, I'm just the guy who coded the system. I have to confess I don't know exactly what the "ident" is, if anyone would be so kind to explain.
Also those spider types (rss, search) the old xml has, anyone knows if vBulletin uses that info?
I am not familair at all with spiderlists so please forgive my newbee questions; is this spiderlist suitable for all types of sites?
I suppose what it does is get your site spidered by the spiders that are listed in the .xml file? Wouldn't these spiders normally find your sites by themselves?cant help you with the first question, but to answer your second one ...
it doesnt help get your site spidered, it simply identifies the spiders/bots that are comming to your site. :)
Sorry I was really busy this week.
I updated the list, but deleted the toolbars and plugins as they aren't spiders.
Boofo, the accoona user agents you are submitting only contain common stuff that any user agent can have. For vbulletin to be able to detect a spider the Ident must have just the string that is unique to the spider. Like this:
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Deepnet Explorer 1.5.0; .NET CLR 1.0.3705)
The detection ident is
Deepnet Explorer
Acoona does the same thing. They have figured out how to squeeze by. ;)
MR K, I'm not sure what's up, the list should have detected the bot with one of those two idents used... You said they were being detected before? Did you upload the xml to the correct folder?
Here are two that are HAMMERING MY SITE.:
c138.cyan.fastwebserver.de <--This one the prefix changes color ( purple.fastwebserver.de..etc etc)
and
ns.km31707.keymachine.de
I have no idea what they are but I have blocked the IP addesses and they STILL keep hammering away.
HELP!!
-Dave
The other one ident's is only 'omgilibot'.
Great idea, and job!!! I will definitly be downloading and using them, :) Thanks.
It would be nice if this thread was stickied.
I use spiders as random "foes" appearing on my forum. MSN as an Orc, Yahoo as a Goblin, etc. :cool:
I won't bother replying that. Re-submit your spider and I'll add it.
I already am subscribed. ;)
Let me know if I can help in any way.
Actually, the ones you are submitting appear on my site too :)
But yeah, I'm a very lazy guy :p
I have no idea what this is for. Forgive my ignorance, but what does this do for a forum? Does it just help vbulletin list the correct bot that is sucking content from the forums?
sweet thanks!
If people expect I code and host the system AND add 200 spiders the project dies here.
If Stadler had had that attittude it never would have started in the first place. If it dies, it dies. I was just letting you know what it takes to get people motivated.
Well, I did not know that. How do I know if it is already been submitted? Better to submit it than not, right?
How often do you update the spiders xml for download?
The types are: blog crawler, RSS crawler or crawler etc.
See the definitions in the XML file in this thread for more information http://www.vbulletin.com/forum/showpost.php?p=565415
just updated the spider.xml nothing is changed ... (ok i'll wait a bit) however check the pic attached ... it's a weird '+' before the 'http' ...
thats arguable, people may want to see how many people using that service is there
Then I think MRSPUTNIK needs to be re-added as I originally reported. As you can see, the link I posted says what it is. His argument pretty well re-enforces my original report.
Well, I decided it shouldn't, for some reason at the start of the project.
Most info on that was wrong I have a hunch too. I think the ones he didn't know were classified searchspiders. I considered later adding spider type, but haven't got to it, and thought having the spider website was enough.
Also I'm not sure vbulletin uses the spider type field.
I think it's because I have no real life except for catching spiders.
Ok, I was using the old spiders XML file to find spiders on my forum to add to the system, but I realized I would never remove it for fear of missing a spider. So, I added all spiders from the old XML to the system, and we'll remove the ones that doesn't exist anymore.
So, now we have 400 spiders in the system, and I need a hug after all this work or I'll cry. :p
You are the man! Excellent job! ;)
Why is the xml file so much smaller than the other one?
Here's another one for you. This one came in as a Guest.
Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.12) Gecko/20080201 Dealio Toolbar 3.1 Firef
this one doesn't look like a spider
Ok, added
I do apologise, I use the spiders.xml to tell my guests who have visited plugin (http://www.vbulletin.org/forum/showthread.php?t=131314) which visitors are spiders and as this one doesn't identify as a spider, I assumed that it wasn't in the latest list I downloaded.
Here's another one:
93.103.33.238 (http://www.fathers-rights-forums.com/forums/online.php?do=resolveip&ipaddress=93.103.33.238)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; MRA 4.6 (build 01425); MRSPUTNIK 1, 5, 0, 19
And here is a link talking about what it is:
http://www.webhostingtalk.com/showthread.php?t=660662It's wrong to include this useragent in the spiders list because this is the normal user with installed mail.ru Agent application (something like Google Talk). Many people from russian speaking country install mail.ru agent. Please remove it.
Here is a list I use.
http://www.botsvsbrowsers.com/category/1/index.html
It's the biggest I've seen so far.
The bad thing is I just caught another spider posing as a Guest. The IDENT string was normal but after following the link and resolve it turned out to be a spider site. I posted it in the thread you were asking about the spider IP. We have got to figure out a way to get the ones who bypass the IDENT string catch.
With the type of site I have with such low traffic, it is easier to catch them on there than it is on a much busier site. Most of them have no business there as to what the site is about.
Just to let you know why I like this project so much:
I show adsense adverts to guests only.
I have the hack that shows spiders separately on the home page.
By having the spiders accurately identified, I can get a very quick glance from the home page of the size of the audience viewing adverts at that moment.
I'm thinking of creating a vBulletin hack to add a scheduled task to fetch the spider list weekly to ensure it's never too far out of date. Not daily as it doesn't change that much and the traffic to your server might be excessive if it ended up a popular hack.
I'd be interested in seeing what you come up with for that hack. If you need any help, let me know. And good to see you again, sir. ;)
Actually I meant no one seems to care about this so far. I thought the first half-hour after I posted this would go:
- dream you rock!
- YAY!!!
But it's fine, I only lost one night's sleep over this, so no biggie.
Never mind, Thanks are not the only reason behind why we do work.
Oh ok :) cheers
Did you bother to check the link and this? MRSPUTNIK
That is no regular user.
Was it different than what was already in there? I didn't look.
Yeti/1.0 added.
You sure it's a spider? Could be someone on the offices where the IP is used I think. Say an employee from accoona is surfing. Maybe unlikely though, depending on what your site is.
Never mind for the type of the crawler, or never mind as I shouldn't mind people not coming into this thread?
Then I think MRSPUTNIK needs to be re-added as I originally reported. As you can see, the link I posted says what it is. His argument pretty well re-enforces my original report.
Then we should add that MSN .NET service too by that logic...
Excellent work, thank you! :)
Three weeks and not one submission :(
Google Spider
Searching Forums
User: Beerman1 (http://www.monstermayhem.org/forums/member.php?u=79)
Yahoo! Slurp Spider
Viewing Who Posted
sweet find (http://www.monstermayhem.org/forums/showthread.php?t=4065)
Why do Google spiders show up with a physical user of the site? The others only show the spider and what thread it is picking up which is what I assume is the way it is supposed to work.
Well, where's Stadler? I'm not motivated to add 200 spiders alone, and I'm just letting you know. If you think my attitude is wrong so be it.
This just came in as a guest:
194.90.190.48
omgilibot/0.3 +http://www.omgili.com/Crawler.html
Added spider Panscient, thanks.
In my forums it shows as Google Spider. Are you sure the user name isn't on the field "activity"? As in, searching for posts of user XX?
This was a simple cut and past from the Whos Online list.
Could someone please provide me with info on how to gather the data needed to submit a spider? I am using the most up to date xml file and when I view guests on my forum, I see a list of ip addresses. One I click on some of them, the resolve to names with the word "spider" in them.
With that said, I'm assuming they are not listed within the XML file I am using and they are a bot.
How do I gather enough info a spider to submit it?
Thanks guys
I forgot about that on the Who's Online page. I never use it. ;)
You are welcome Alfa1 :)
No you just upload it to includes/xml/.
Just don't forget to not overwrite it when you upgrade your vb.
You told me to submit the IDENT. That is what I did. All I did to find out it was a legit spider was to do a Google search for T312461. Aren't you even doing that?
I don't like your work because you don't bother to check them out. Or if you are, then you need to find another source. You whined because no one was submitting spider info. I start submitting them and you reject them. What sense does that make? Just because you are doing a spider list, doesn't make you an expert or any better than the rest of us. Come down off of your high horse and join the rest of the world.
Just submitted this one
Charlotte
Mozilla/5.0 (compatible; Charlotte/1.0b; http://www.searchme.com/support/) (2549df004ae664faef17dce174913cea
http://www.searchme.com/support/pages/spider.php
info@searchme.com
This was submitted a few posts back.
Ok, but that ident has no unique string from Acconna. vBulletin won't be able to detect it with that.
Another one.
EnaBot/1.2 (http://www.enaball.com/crawler.html)
Another one:
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; Google Wireless Transcoder;)
Just added three:
Windows RSS Windows-RSS-Platform/1.0
Windows RSS Windows-RSS-Platform/2.0
Mozilla/4.0 (vBSEO; http://www.vbseo.com)
I've taken the windows platform off of the Windows RSS ident string as that would unnecessarily bloat the number of idents needed and I'm not sure anyone would care whether it's XP or Vista that the RSS requests are coming from.
Just to let you know why I like this project so much:
I show adsense adverts to guests only.
I have the hack that shows spiders separately on the home page.
By having the spiders accurately identified, I can get a very quick glance from the home page of the size of the audience viewing adverts at that moment.
I'm thinking of creating a vBulletin hack to add a scheduled task to fetch the spider list weekly to ensure it's never too far out of date. Not daily as it doesn't change that much and the traffic to your server might be excessive if it ended up a popular hack.
I would think Google has a list of spiders of their own, and don't show adverts to those spiders on googlesyndication.com, but I'm not sure.
If people expect I code and host the system AND add 200 spiders the project dies here.
Not that I know of, sorry. Stadler's XML had IP entries, but I'm not sure if vBulletin can detect spiders by IP.
To be honest, I'm curious now and I'm gonna ask it in the support forums. If vBulletin does detect spiders by IP, I'll add spiders IPs to the system.
I think it's very cool that you did this, Dream. Don't forget that it is the weekend and people may be out and about at the time.
ok removed mrsputnik and added google mobile spiderThank you.
Another user agent to be removed (IMHO) is Google Wireless Transcoder. In fact it's not a crawler, but a service (http://google.com/gwt/n). So, even if the connection is coming from Google's IP, there's a user browsing forum thru this service.
Sorry this goes beyond my interest in the problem, but if you get someone to do it I'll add IPs to the system.
No problem, I'll take care of it on my own.
Here are some regular bots on my boards:-
Did you already add these to the spiders list then?
Thanks, added.
Ok I added it, wasn't it on there already though? It was there with a different ident, it wasn't being picked up?
I added some now, hope I filled out everything correctly.
I think you should add some on your own aswell...
I approved the 4 you sent
I'm adding the ones I find in my forum
Thanks, yes maybe you are right :)
I use the old spiders list (http://www.vbulletin.com/forum/showpost.php?p=565415) on my forums, and I'm adding them to the new list as I see them appear on my forums, just so you know. Please only submit spiders from the old list if you know they still exist.
Add this to the top of your robots.txt file:
I rather just checkmark a spider instead of the tedious task of typing it out.
I do know about using that method and .htaccess, just rather have it shown in admincp and choose which spider can access what area and prevent any from all site.
First. is that fairly accurate? And second, wanna share the xml file for it? ;)
Some are dead links and so goes for what I have posted...
This accoona-a133.client.pins.net is showing up as a Guest. It was added in your last update as Accoona which I am using. It is not showing as a Spider. The IP address is 209.212.73.133. Maybe taking that extra code as you call it out of the old spiders xml might not have been such a good idea? I don't have my setup to resolve IPs addresses.
Added spider Internet for learning.
Thanks, that bot was in the list, I just updated the Ident.
Sorry I was really busy this week.
I updated the list, but deleted the toolbars and plugins as they aren't spiders.
Boofo, the accoona user agents you are submitting only contain common stuff that any user agent can have. For vbulletin to be able to detect a spider the Ident must have just the string that is unique to the spider. Like this:
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Deepnet Explorer 1.5.0; .NET CLR 1.0.3705)
The detection ident is
Deepnet Explorer
Sorry, but the Accoona one I gave you the INDENT string for straight from the Who's Online. Maybe that's how it slips by the robots.txt file.
thank you dream for the latest lists. :)
Thanks zappsan :)
I think "woriobot" catches both "woriobot heritrix" and "woriobot", so I removed the heritrix one. I'm not 100% sure of that though, so if you see it again please let me know.
http://dream.epicfailed.us/ ?
What do you mean?
Well, all I know is I get 4 to 5 guests at a time and I know they are spiders as they don't do anything. What spiders they are, I don't know. As far as any that don't exist any more in the lisitng, that won't hurt leaving them in there for now. But someone needs to update the list, even if only a few at a time.
yep
Here's another one:
93.103.33.238 (http://www.fathers-rights-forums.com/forums/online.php?do=resolveip&ipaddress=93.103.33.238)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; MRA 4.6 (build 01425); MRSPUTNIK 1, 5, 0, 19
And here is a link talking about what it is:
http://www.webhostingtalk.com/showthread.php?t=660662
You ought to learn to read closer. The last spider I submited was not the FunWebProducts that you are referring to. That issue is done. The spider I posted, I posted the link for the proof.
Ok, I apologise.... I must have mis-interpreted the conversation.
Here are some regular bots on my boards:-
85.225.137.240
Mozilla/4.0 (BejiBot Crawler 1.2a)
88.131.106.7
Speedy Spider (http://www.entireweb.com/about/search_tech/speedy_spider/)
82.80.252.110
BoardTracker (http://www.boardtracker.com/spider.html) (Mozilla/4.0 compatible; MSIE 6.0; Linux Cent
61.247.217.36
Yeti/0.01 (nhn/1noon, yetibot@naver.com, check robots.txt daily and follow it)
209.11.177.198
Mozilla/4.0 (compatible; BOTW Spider; +http://botw.org)
142.166.3.122
R6_CommentReader(www.radian6.com/crawler)
http://www.cuill.com/twiceler/robot.html
I added some new ones which I came across today.
I also resubmitted one, it had a different ident than the first time I've seen it (I've explained it in the notes field).
Added
Begun Robot Crawler
Mail.Ru
Not sure if this has been reported ....
crawl2.nat.svl.searchme.com
Mozilla/5.0 (compatible; Charlotte/1.0b; http://www.searchme.com/support/)
And added this one too:
NetNewsWire/3.1b4 (Mac OS X; Lite; http://www.newsgator.com/Individuals/NetNewsWire/)
I quoted one already and there are many more on the net. Use the search like I did.You quoted a thread with link to the mail.ru Agent, nothing more. Also, like I previously explained MRSPUTNIK is a string insereted into the browser user-agent when mail.ru Agents software is installed (it's not a spyware, it's not a malware, it's not a virusn not a spamware). Just FYI: mail.ru is a russian TOP10 portal, getting about 15.000.000 unique visitors by month. And here is the home page of mail.ru Agent (http://agent.mail.ru/en/) (in english).
MRSPUTNIK is spammers, no matter how you try to justify it. Are you one of themHmmm, some people is so brillant, nothing can escape them :rolleyes:
See my cut and past at the top of my post. Google Bot shows as user "Beerman" Was wondering why it lists as a user and not like the Yahoo Bot that is just viewing.
This is sweet. Thanks for the hard work, man.
this one doesn't look like a spider
I wasn't really sure on that one but I thought it better to report it than take the chance it might be a spider.
Just added these two:
Yodao Mozilla/5.0 (compatible; YodaoBot/1.0; http://www.yodao.com/help/webmaster/spider/; )
Mac OS X RSS Apple-PubSub/59
Well I do, it keeps with the setting, :)
Spiders appear anyways, so I figured I should put them to good use, :cool:
I just remembered when I first started separating out the guests and spiders count on my forum and people were wondering what spiders were. I'm trying to imagine what they would think if I changed the spider names to "goblins" and "orcs". Sounds like some fun for October though.
the livebot spider seems to be including anonomouse proxies. i confirmed this by visiting my site through a webpage that offers an anonomouse browsing service and sure enough, it idintified me as a livebot spider. im also afraid that the ns.km31707.keymachine.de spider that was submitted a few posts back, is actually a spambot.
This is a good list, but again you would need to click though or do a internet search to make sure these spiders are still active. I would still like to be able to ban a spider, as some do just take your images.
http://www.user-agents.org/
Add this to the top of your robots.txt file:
User-Agent: Googlebot-Image
Disallow: /
nice job Dream :)
i've a little question about this: yesterday i've upgraded the vB and unfortunately i forgot to delete the newer file from the vB package ... so today i can't detect correctly the spiders ... however i just uploaded your latest file, but i saw that it doesn't reconize some spiders like 'snapshots' that before your older files reconize ... so the question is: how much time it takes before that the system runs 'correctly'?
Another one:
Deepnet Explorer
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Deepnet Explorer 1.5.0; .NET CLR 1.0.3705)
Got another one that just hit the site along with 25 yahoo spiders.
69.90.42.67 (http://www.fathers-rights-forums.com/forums/online.php?do=resolveip&ipaddress=69.90.42.67)
Mozilla/5.0 (compatible; OWPBot/0.3; http://www.openwhitepages.com/)
thats arguable, people may want to see how many people using that service is there
A second Yeti bot was added.you are the spider/bot God :cool:
just added another msnbot for the verification ...
That last one boofo is just some dodgy IE plugin abit like mywebsearch or some other nasty. Definitly not a spider ;)
I checked out their site and it looked like a spider to me. You may be right, I don't know. Better to be safe than sorry in reporting it. ;)
Is anyone still updating the list?
Maybe, I don't know for sure.
You are right:
http://www.webmasterworld.com/forum11/2715.htm
I removed Livebot from the list.oops ... nevermind about the last post. :D
I changed the Ident from http://www.omgili.com/Crawler.html to omgilibot, tell me if it still doesn't work.
Another one.
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; BCD2000; SV1; FunWebProducts)
added soso
Added GurujiBot.
Here's another one that showed up as a Guest:
194.90.190.48 (http://www.fathers-rights-forums.com/forums/online.php?do=resolveip&ipaddress=194.90.190.48)
omgilibot/0.3 +http://www.omgili.com/Crawler.html
Congratulations! Although there are many repeat ones, I'm sure. ;)
Did you bother to check the link and this? MRSPUTNIK
That is no regular user.Yes, I did. If MRSPUTNIK is included in the spiders list we will get wrong results, because it's not really a bot, but a regular user just browsing the forum with mail.ru Agent (and some other products from mail.ru) installed.
The spider I was referring to in the post you quoted was just added by Dream. ;)
I really dont care if Dream added it or not.... it's not a spider, period.
If he choses to add it, or not add it - despite the glaring evidence it is NOT a spider, then thats totally up to him.
A massive part of my job involves checking log files and I stand to be corrected only if you can post me url's proving this to be a spider.
As far as I am concerned, I've proved it isnt - if you wish to call black white, then please post the proof.
Sorry I didn't notice that was the IP.
There's one Acconna in the xml, but I never said it was duplicated, did I? Anyway, when you see it as a guest in your forums again, paste the User Agent for me ok? So I can fix it.
edit: oh yes, someone else submitted it for you actually.
And nobody said you said any such thing. What is that all about?
Here is another one for you, but this time only the IP showed and it was showing as a Guest. It is explained in the quote box.
omgilibot
http://www.omgili.com/Crawler.html
Here is the IP that shows:
194.90.190.48
I got the Ident info from do a var_dump for something else. The IP is all that showed for this and it was showing as a guest. The IP did NOT resolve to anything other than itself. The ident came from the var_dump.
In these next ones, the first one shows up fine. The next three show up as guests. And your accoona-a133.client.pins.net is still showing up as a guest.
livebot-65-55-209-98.search.live.com - MSNBot Spider
livebot-65-55-165-117.search.live.com - http://search.msn.com/msnbot.htm
livebot-65-55-165-52.search.live.com - http://search.msn.com/msnbot.htm
livebot-65-55-165-42.search.live.com - http://search.msn.com/msnbot.htm
I think I'm going back to Stadler's version as it caught a lot more spiders than this version does. I'll just add them to that as I find them. Good luck!
Thanks, added.
Added SiteVibeBot
Can anyone confirm this one gets detected? I'm trying to understand vB's regex for the IDENT string (because if I ask in the How To forum no one will know).
So, I had 12 other regular people using mail.ru that same day on at the same time on my little 65 member site? Not likely.
This is what I submitted and you said it had already been submitted. It is in the current xml file.
accoona-a133.client.pins.net
but it shows up as a guest, not a spider.
Dream could offer an spider.xml file for the FAQ for admins to import. And instructions on how to change a phrase to link from who's online to the faq explanation. Might be cool for some sites. (just brainstorming)
thanks
It has CAPTCHA! I'm so proud of myself :P
Thanks buro9, your submissions were added and are greatly appreciated.
Yeah I agree with you, it's most likely a spider, depending on the occurrence and URL it's visiting.
There's not much I can do but to whine for them to add IP recognition in the suggestions forum though.
There has to be a way around this even if it means us coming up with a hack to do it. Are you game?
Submitted dragonfly
ebingbong#playstarmusic.com (though they say the # is an @?)
http://www.ebingbong.com/help/ourRobot.php
Yes I use that too. Were you using the latest XML?
Downloaded again last night to double check and the searchme spider still shows as a regular guest. :) Also upgraded to the latest release of product to make sure that it wasn't at fault, but made no difference.
This of course has not too much to do with this thread and I would like to say how much I appreciate you keeping the spider list updated. :)
Added spider Panscient, thanks.
Google Spider
Searching Forums
User: Beerman1 (http://www.monstermayhem.org/forums/member.php?u=79)
Yahoo! Slurp Spider
Viewing Who Posted
sweet find (http://www.monstermayhem.org/forums/showthread.php?t=4065)
Why do Google spiders show up with a physical user of the site? The others only show the spider and what thread it is picking up which is what I assume is the way it is supposed to work.
See my cut and past at the top of my post. Google Bot shows as user "Beerman" Was wondering why it lists as a user and not like the Yahoo Bot that is just viewing.
In my forums it shows as Google Spider. Are you sure the user name isn't on the field "activity"? As in, searching for posts of user XX?
Trust me, it's a spider. Check out my site URL and tell me what you think then.
http://www.fathers-rights-forums.com/forums/
Then you and I will form a plan to tackle this dilemma. ;)
Another one.
That last one boofo is just some dodgy IE plugin abit like mywebsearch or some other nasty. Definitly not a spider ;)
It should detect as soon as you upload the new file.
Cool. Sorry, IM as in instant messenger feeds?
No problem. As I said, I need the User Agent of the spider, not the IP. This is a sample user agent:
Mozilla/5.0 (Windows; U; Windows NT 5.1; pt-BR; rv:1.8.1.12) Gecko/20080201 Firefox/2.0.0.12
I don't know how the old xml can get more spiders, as this one has all spiders the old one has. But it's your call and you may be right, if you find out why please tell me.
Also, the Omgili spider was added and the file updated.
Do you want all spiders or just the good ones on that list of yours.
the second :)
Great idea Dream, I'll be using this for other projects rather than just vB :)
Actually I meant no one seems to care about this so far. I thought the first half-hour after I posted this would go:
- dream you rock!
- YAY!!!
But it's fine, I only lost one night's sleep over this, so no biggie.
You may add:
- the type of the crawler.
- how many spiders your file does contain.
- when last updated.
Regards
I added both but the type of the crawler, not sure what you mean.
If you have more suggestions let me know.
Here's another one for you. This one came in as a Guest.
Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.12) Gecko/20080201 Dealio Toolbar 3.1 Firef
Well, according to what I have read, MRSPUTNIK is spammers. It's up to Dream, it is his list.
Yeah I agree with you, it's most likely a spider, depending on the occurrence and URL it's visiting.
There's not much I can do but to whine for them to add IP recognition in the suggestions forum though.
That is why I am using the latest one he did. It may be old and outdated on a few Spiders but it does recognize a lot of them. That is better than nothing in my book. It isn't like the old days when everyone jumped in for a common purpose. No one said you had to add 200 spiders. All I said was give them an update, even if only with 10 new spiders, to wet their appetite. If you aren't willing to do that, then scrapping it is probably your best bet.
That's a pretty good idea. I've always been looking for an updated spiders list.
I might not be the best person to create a FAQ about spiders, I'm just the guy who coded the system. I have to confess I don't know exactly what the "ident" is, if anyone would be so kind to explain.
I'm not sure about the ident either. What would I have to put in there? It says user-agent, should I just put the info here which is displayed when I choose to display the user agent?
I am willing to help if I'm sure about what I should put into the fields.
That last spider I submitted IS a spider. I checked out their web site. No sense in submitting them if they are going to get ignored.
Boofo, I beg to differ greatly. This is the scumware that is mywebsearch IE toolbars and their many variants. Why else would they identify in many different IE versions?
I would love to see the source for your information...
Here are my sources:-
http://www.webmasterworld.com/forum39/1510.htm
http://www.seroundtable.com/archives/001430.html
There are even lots of pages across the net detailing how to get rid of said 'scumware'. Here is just one of them:-
http://www.liamdelahunty.com/tips/fun_web_products.php
Well, according to what I have read, MRSPUTNIK is spammers. It's up to Dream, it is his list.Well, please quote your sources. Anyway, it's possible that guys running browsers with this user-agent string spam some forums, but it's true for any user-agents strings.
Another: MLBot (www.metadatalabs.com)
Heya,
I made a system where people can submit spiders and download updated spiders_vbulletin.xml files for their forum. After you submit a spider I must approve it for it to be included in the list.
Hope this helps everyone.
http://spiderlist.codeforgers.com
thank you great work mate :D
You may add:
- the type of the crawler.
- how many spiders your file does contain.
- when last updated.
Regards
The spider I was referring to in the post you quoted was just added by Dream. ;)
That was quick. And you're welcome. ;)
So, I had 12 other regular people using mail.ru that same day on at the same time on my little 65 member site? Not likely.Boofo, I don't know why there was 12 regular people browsing your little site, but I know for sure that this useragent string belongs to regular users. If you bother to check user-agents.org for MRA and MRSPUTNIK now, you will see that both was dropped from the listing.
Update: user-agents.org still report MRA, but as regular browser
I added some now, hope I filled out everything correctly.
I think you should add some on your own aswell...
Someone submitted yours I think, I thought it was you honestly.
You just look at the spider list and see if the spider is already there. If not you can submit it.
The list is updated whenever there are new spiders to approve.
is mail.ru a spam site or it's "ok"?
It's hard to believe that the spiders only like my site. ;)
Yes I can't find a good spider list even in Google.
Excellent, that was exactly what I was looking for.
By the way. It is recommended to install this mod:
http://www.vbulletin.org/forum/showthread.php?t=152321
To remove spiders from the "Currently Active Users" list.
Cheers,
Gabriel.
I'm hoping for all existing ones.
That last spider I submitted IS a spider. I checked out their web site. No sense in submitting them if they are going to get ignored.
Ok, I added it exactly how you submitted it.
It should detect as soon as you upload the new file.
Dream, the snapshots spiders aren't detected ... (48 hours are gone) i know this cause i perfectly know its IP ... :)
http://www.comicguide.net/images/smilies/lolrot.gif
Well I do, it keeps with the setting, :)
Spiders appear anyways, so I figured I should put them to good use, :cool:
Nortel Unveils Vision, Strategy for Israeli High-Performance Net
Busy Friday Leads to Strong Close for Net Stocks
|