[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/pol/ - Politically Incorrect


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: IMG_0006.png (592 KB, 3420x1956)
592 KB
592 KB PNG
Posted yesterday but saw some people were interested - I will open source this and host it online once I have implemented a few more things. I have set up a python script to scrape every /pol/ post so I can properly assess shilling on this board. I have put together an interface so i can see who talks about what the most any time something interesting happens, like if some new epstein files drop suddenly a bunch of indians make BBC threads or something. This is just the data collection stage at the moment, i intend to get averages of who talks about what each day over months and then compare it to averages of discussions when something happens. Each scrape is 2mb and about 10,000 posts, I have it running 24/7 and it takes scrapes every 25 minutes, I have it on a 2TB hdd at the moment and estimate about 40gb a month, so I should be able to get a years worth to allow me enough data for a proper statistical investigation.

Its obvious this board is shilled to fuck, i would like to both quaify when and quantify to what extent. I will keep you posted.
>>
I say it's the Jews all the time on here because it legitimately is the fucking jews. Am I shill? Faggot.
>>
>>524172219
Nah I’m more getting at if something happens (which it never does) and suddenly a larger influx of posters come in, maybe from the same country maybe not, and suddenly start increasing post traffic, posting about the same thing, or making some narrative and then suddenly go away a few days later that can be visualised. Like you would expect to see a spike in users, a spike in that new term, then a decrease to normal posting levels and a higher than representative decrease in that new term. If something is organic then you should see the opposite of that, a wide, increasingly gradual incorporation of that term with a Gaussian distribution in post time and geolocation
>>
>>524172090
>doashboard graphic designers have all been fired

RIP.

Data is a drug
>>
>>524172090
>>524172632

This is the kind of autism I fuck with
>>
I am very interested in this
please give us a way to stay updated, a twitter to follow, a blog, an email, whatever works for you
you could pick up a tripcode and I will search for posts made by you in the archives in the future
>>
>>524172972
Gotcha man. I’ll host it on a website and will post the backend and frontend to a GitHub once Christmas is out of the way, will just leave it running until then to generate data. Ill post enough that you should know its live.
In case anyone was wondering why I’m scraping rather than using the archives - i am certain my database is a 1:1 copy of /pol/, I cannot be sure archives are not tampered with and desu neither can you be
>>
>>524173301
good to hear
here's another bump
>>
This thread was moved to >>>/bant/23791213



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.