ebooksgratis.com

See also ebooksgratis.com: no banners, no cookies, totally FREE.

CLASSICISTRANIERI HOME PAGE - YOUTUBE CHANNEL
Privacy Policy Cookie Policy Terms and Conditions
Wikipedia:Duplicated sections - Wikipedia, the free encyclopedia

Wikipedia:Duplicated sections

From Wikipedia, the free encyclopedia

✘ This Wikipedia page is currently inactive and is retained as a historical archive.
A historical page is either no longer relevant or consensus has become unclear. If you want to revive discussion regarding the subject, you should seek broader input via a forum such as the proposals page of the village pump.
Active Wiki Fixup Projects
Leading the charge in the War on Error!
Must be active, systematic, have lists, & need help.
Writing
Articles that need to be wikified

Massive backlog.
(Category, live update)

Dead-end pages

These pages are not wikified.
(Updated 2007 Nov 3)

Most wanted stubs

(Updated from 2006 Jan 25 dump, still active as of 2007 May 4)

Most wanted articles

(Updated 2007 Sep 8)

Missing articles

Wikipedia is not as complete as you might think!
(ongoing)

Other
Disambiguation pages with links

Directing ambiguous links to the intended articles.
(Updated 2008 March 12)

Templates with red links

Help solve red links in templates through writing or repair.
(Updated 2007 December 5)

Interlanguage links

Add and improve interlanguage links in articles.
(Updated 2007 August 25)

Red Link Recovery

Repair red links in articles.
(Updated 2007 July 2)

Unreferenced articles

Ensuring articles include at least one reference or source.
(Category, live update)

Articles needing geo-coordinates

Help locate places. See WP:GEO
(Category, live update)

Uncategorised articles

Help categorise articles.
(Category, live update)

Orphaned articles

Help link to these orphaned articles.
(Category, live update)

Linkrot

Fix broken links to external websites.
(Updated 2007 Jan 13)

Transwiki log cleanup

Articles that have been transwikied and need to be checked for possible merging or deletion.

Main - Inactive - Mini

Bug 275 (since fixed) has caused the accidental duplication of entire sections of some articles. This page is an attempt to locate all such instances of this problem and fix them.

A script was run on an offline copy of the database. First, it isolated all pages with duplicate headers. Then, it sliced each remaining page into three-word "chains" or "triplets" and looked to see how many of these chains appeared more than once. The percentage of repeated chains are reported for each article. A high percentage is a good indication that duplication has occurred.

This list was produced with the June 26, 2005 database dump, so many such instances have probably already been fixed. (You can check using the history feature whether duplication actually did occur.) But the following need to be checked. We're not sure what a good percentage cutoff is, so start at the top and work your way down. Please strikethrough fixed pages and underline false positives, so we can determine if the detection algorithm is working well, and when we should stop checking. We've also included a section sorted by absolute number of triplets repeated, in case there are long pages with small duplications. Thanks for your help!

Contents

[edit] How to update

The Perl script used is on /script.

[edit] Suggestions for improvement

  • Note that the listing for under 300 triplets is still available if it is needed, though if there are a high number of false positives, it maybe more efficient to wait for a new database dump. -- Beland 06:04, 1 August 2005 (UTC)
  • It might be a useful metric to figure out what the length of the longest duplicated section is, though this will clearly be more computationally complex. Also, small differences in duplicated sections may appear over time. Another red flag might be if a repeated section starts with a header, though again, this may be corrupted over time. -- Beland 06:09, 1 August 2005 (UTC)

[edit] By percentage

[edit] 45%-50% - Done

[edit] 40%-44% - Done

[edit] 35%-39% - Done

[edit] 30%-34% - Done

[edit] 25%-29% - Done

[edit] 20%-24% - Done

[edit] 15%-19% - Done

[edit] 11%-14%

[edit] By absolute number (10% or less)=

[edit] 1500-9000

[edit] 1000-1499

[edit] 500-999

[edit] 400-499

[edit] 300-399


aa - ab - af - ak - als - am - an - ang - ar - arc - as - ast - av - ay - az - ba - bar - bat_smg - bcl - be - be_x_old - bg - bh - bi - bm - bn - bo - bpy - br - bs - bug - bxr - ca - cbk_zam - cdo - ce - ceb - ch - cho - chr - chy - co - cr - crh - cs - csb - cu - cv - cy - da - de - diq - dsb - dv - dz - ee - el - eml - en - eo - es - et - eu - ext - fa - ff - fi - fiu_vro - fj - fo - fr - frp - fur - fy - ga - gan - gd - gl - glk - gn - got - gu - gv - ha - hak - haw - he - hi - hif - ho - hr - hsb - ht - hu - hy - hz - ia - id - ie - ig - ii - ik - ilo - io - is - it - iu - ja - jbo - jv - ka - kaa - kab - kg - ki - kj - kk - kl - km - kn - ko - kr - ks - ksh - ku - kv - kw - ky - la - lad - lb - lbe - lg - li - lij - lmo - ln - lo - lt - lv - map_bms - mdf - mg - mh - mi - mk - ml - mn - mo - mr - mt - mus - my - myv - mzn - na - nah - nap - nds - nds_nl - ne - new - ng - nl - nn - no - nov - nrm - nv - ny - oc - om - or - os - pa - pag - pam - pap - pdc - pi - pih - pl - pms - ps - pt - qu - quality - rm - rmy - rn - ro - roa_rup - roa_tara - ru - rw - sa - sah - sc - scn - sco - sd - se - sg - sh - si - simple - sk - sl - sm - sn - so - sr - srn - ss - st - stq - su - sv - sw - szl - ta - te - tet - tg - th - ti - tk - tl - tlh - tn - to - tpi - tr - ts - tt - tum - tw - ty - udm - ug - uk - ur - uz - ve - vec - vi - vls - vo - wa - war - wo - wuu - xal - xh - yi - yo - za - zea - zh - zh_classical - zh_min_nan - zh_yue - zu -