Wikipedia talk:AutoWikiBrowser

AutoWikiBrowser 6.2.1.0

This is the discussion page for the AWB project. It is also the place to discuss using the AWB program itself (if you need help, or have a question about AWB, etc.). Where to make specific types of reports or requests is explained in the Before you post section below. Before asking questions, please read the Frequently asked questions below.

Before you post

Do you want to ... Please use
Report a bug or request a feature in AWB? Check reported tasks before filing a new task. You do not need to create another account there; just log in with your normal Wikimedia account. See this MediaWiki wiki page on how to report bugs and request features on Phabricator.
Report a bug details

Try to report bugs in the current version of the software. Update to the most recent version and check to make sure your bug has not been reported already on this page. See "How to Report Bugs Effectively" for advice on how to write bug reports.

Before posting anything related to non-Wikimedia Foundation wikis, verify that the site is running a recent version of MediaWiki with enabled Bot API. Older versions of MediaWiki or without the Bot API are not supported. Be sure to mention the exact URL of your wiki.

Request a feature details

Please use the feature request button to add new feature requests. This format allows the developers to keep track of feature requests. Take some time to search the archives, both on-wiki and on Phabricator to check whether a similar request was previously discussed.

Report an incorrectly fixed typo? Wikipedia talk:AutoWikiBrowser/Typos
Request approval to use AWB? Wikipedia:Requests for permissions/AutoWikiBrowser
Ask a question about AWB or ask for help? This page

Frequently asked questions

Frequently asked questions
  • When I start it up I get one of the following errors: "The application failed to initialize properly (0xc0000135). Click on OK to terminate the application.", or "To run this application, you must first install one of the following versions of the .NET Framework..." This error means your computer does not have the .NET framework version 2 installed properly. You can choose from various versions for download here, or you can run Windows Update and select version 2 of the .Net framework from the "Optional Updates" section, if you want the choice made for you.
  • Does AWB run on Linux or Mac?
  • Does AWB work on other projects and languages? Many Wikimedia projects and languages are supported, see the "User and project preferences" option in the general menu. Other languages will be added on request, though at the moment the interface is always in English. You are also able to use AWB with third-party wikis: Options > Preferences > Site, you can change the wiki there. The wiki must support the Bot API required by AWB. This means that it should have latest HEAD version of MediaWiki or something close to that. The wmf-deployment branch is also recommended, as this is what is currently live on WMF sites.
  • Under Windows Vista (and newer), AWB is using wrong font size, which results in clipped text and lost buttons and options, (see example here). How to fix it?Tracked in Phabricator
    Task T103506
    • Solution #1: Go to "Control Panel\All Control Panel Items\Display" and switch resizing of the fonts to 100%.
    • Solution #2: Right click on AutoWikiBrowser.exe --> Properties -> Compatibility (tab) --> enable the "Disable display scaling on high DPI settings" option or for Windows 10, if available, select System (Enhanced).
  • AWB puts stubs after categories, though categories are always rendered the last by MediaWiki? According to WP:STUB#Categorizing stubs, by convention they are placed at the end of the article, after the External links section, any navigation templates, and the category tags, so that the stub category will appear last. If your wiki uses another order, please let us know here.
  • I don't like or use Internet Explorer; please use Firefox instead. AWB does not use Internet Explorer per se. It does, however, use the same web browser control (MSHTML) as Internet Explorer; the equivalent Firefox component does not provide the needed functionality.
  • How do I open the page in another browser if I can't use the one in AWB? Right click on the edit box in the bottom right side of your screen. Select "Open page in browser".
  • How do I edit a page that doesn't exist? Uncheck "Ignore non existing pages" in the "Skip articles" box.
  • How do I skip certain articles? Use the "Skip if contains" and "Skip if doesn't contain" on the "Skip" tab
  • Can't you leave up a "stable" version, so I don't have to download new versions? It is important to keep people up to date with the latest versions, because their use of the software doesn't just affect them, but the whole of Wikipedia. As any bugs that remain will be trivial, hopefully releases won't be too frequent.
  • How can I stop AWB clicking when it changes pages? This is a Windows sound theme setting. This page explains how to turn off the clicking sound. Alternatively, delete the following key from the Windows registry: HKEY_CURRENT_USER\AppEvents\Schemes\Apps\Explorer\Navigating\.Current
  • AWB randomly crashes upon page load on my system, and I always use a browser other than Internet Explorer when using Wikipedia. You may have installed custom scripts incompatible with IE. Wrap the contents of your monobook.js into conditional:
//Detect IE5.5+ if (navigator.appVersion.indexOf("MSIE")==-1) { // Previous contents go here .... }
  • I get Just In Time Debugger Messages when loading AWB/loading pages. In Internet Explorer, go to Tools → Options → Advanced. Make sure 'Disable Script Debugging (Internet Explorer)' and 'Disable Script Debugging (Other)' Are both checked. Press apply and close.
  • Why does AWB run very, very slowly if I try to make changes in the edit window on larger pages, especially pages with long lists or tables? If running on Windows, exit the Speech Recognition software that is built into some versions of Windows; don't just turn it 'Off', you must 'Exit' the software if you have started up Speech Recognition.
  • When I do a clean install of AutoWikiBrowser the application seems to find old setting data somewhere. I'd like to do a really clean install. Any ideas? Clean up your registry and remove the folder "C:\Documents and Settings\user name\Local Settings\Application Data\AutoWikiBrowser" (Windows XP) or "C:\Users\user name\AppData\Local\AutoWikiBrowser\" (Windows 7). Note that the application data folder may be hidden.
  • AWB prompts that there is a newer version but won't update Check the version number of your AWBUpdater.exe. The current version is 2.4.0.0. If you have an older version, you have to download the latest AWB version and make a clean install.
  • Which .NET Framework version do I have? You can find your .NET Framework version in Help → About box.
  • Where are the default settings stored?
    • Windows XP: C:\Documents and Settings\\Local Settings\Application Data\AutoWikiBrowser
    • Windows Vista onwards: C:\Users\\AppData\Local\AutoWikiBrowser\Default.xml
  • I cannot copy text from the diff window using the Control+C keyboard shortcut. You must have Microsoft.mshtml.dll available for AWB to use for this functionality to work. You can try downloading the file (there are a number of third-party websites offering DLL file downloads) and putting it in the same folder as AutoWikiBrowser.exe. This is reported not to work for all users, presumably due to .NET Framework problems.
  • Is there any way to set AWB to not use https? (GFW blocks 443 port) In preferences, set project to "custom". Set the left box to http. In the webpage box, type en.wikipedia.org/w/ (English Wikipedia) or zh.wikipedia.org/w/ (Chinese Wikipedia). Note that leaving off the /w/ will result in a "root element missing" error.
  • How do I login to AWB with accounts enabled with two-factor authentication? You should use a bot password. Despite the name, they aren't just for bots. See Wikipedia:Using AWB with 2FA.

Discussion

GENFIX error

Tracked in Phabricator
Task T293603
Resolved

In this diff, AWB's GENFIX set messed up an implementation of {{hatnote group}}. Could this be fixed to resolve the error? {{u|Sdkb}} talk 04:38, 5 October 2023 (UTC)Reply

I have run into that error as well. What I saw was when AWB seeks to replace a redirect to the template, it ungroups the contents and places the {{hatnote group}} template separately beneath what it had previously grouped. Stefen Towers among the rest! GabGruntwerk 04:46, 5 October 2023 (UTC)Reply It was logged as a bug a couple of years ago. -- John of Reading (talk) 06:52, 5 October 2023 (UTC)Reply @Sdkb, StefenTower, and John of Reading: I received an email this morning that Rjwilmsi has fixed this issue. @Rjwilmsi: What are the plans to release an updated version of AWB with this fix (and hopefully resolve a few more bugs beforehand)? Thanks! You would need to arrange with Reedy if you think a new AWB release is worthwhile. Rjwilmsi 17:55, 5 October 2023 (UTC)Reply I find it weird that AWB releases seem to be done in giant versions, rather than small updates automatically pushed out. The latter seems the more modern approach. {{u|Sdkb}} talk 18:00, 5 October 2023 (UTC)Reply @Reedy: Could we please have an updated version of AWB soon (hopefully with a few more resolved bugs)? Thanks! GoingBatty (talk) 05:27, 6 October 2023 (UTC)Reply @Rjwilmsi: Is Reedy the only one who can release a new version of AWB? Reedy hasn't been very active here lately. GoingBatty (talk) 22:13, 27 October 2023 (UTC)Reply Effectively yes. I can do local builds but on my setup (MonoDevelop/Linux) I can't do a full clean build as Reedy has updated the AWB solution to use C# reference libraries etc. that MonoDevelop can't (yet) handle. Also the AWB release process requires changes to admin-protected pages to update release versions. If Reedy doesn't respond then I suppose I'll have to get Visual Studio set up on a spare Windows machine so I can do a full build and then hopefully we can find another admin to get the AWB version page updated. Rjwilmsi 18:27, 29 October 2023 (UTC)Reply This certainly points to a systemic issue. AWB ideally should be converted to an online tool (rather than a program you have to download) that can be updated constantly whenever there is a fix needed. {{u|Sdkb}} talk 19:21, 29 October 2023 (UTC)Reply @Rjwilmsi: Reedy (talk · contribs) hasn't been online since 1 October, and not very often all year. I reached out to Reedy on #AutoWikiBrowser connect, but didn't get a response, so any help you could provide would be appreciated. GoingBatty (talk) 03:30, 14 November 2023 (UTC)Reply It would be nice if we can figure out a way to share build duties at least, and have more regular releases. I'm a former software developer and would like to see if I can build it, but only if all the required tools are free. That is, is Visual Studio Community enough for the task? Stefen Towers among the rest! GabGruntwerk 00:41, 18 November 2023 (UTC)Reply I'm also a software developer, but Java rather than C#. Regardless, I'm trying to follow the instructions here. After getting through Microsoft's gross privacy-invading processes to download an old version of Visual Studio, I now have it installed. (It was by comparison much easier to install TortoiseSVN.) I've now started VS and opened the AWB project, but I don't know what this instruction means - "When the IDE has loaded, select release rather than debug (next to the green forward arrow).". If anyone can enlighten me, that would be much appreciated. Cheers, Kiwipete (talk) 02:48, 18 November 2023 (UTC)Reply

Regex speed: find-and-replace vs. C#

Tracked in Phabricator
Task T350636

I decided to compare the speed of a find-and-replace rule with the identical rule in C#, both run on German Empire, thinking C# would be somewhat faster. I've found the exact opposite, however.

The following find-and-replace rule:

Find: (\=+\s*(?:(?:Foot)?Notes|Further reading)\s*\=+)((?:\s*\*?\s*\{\{\s*(?:Wik|Commons|Reflist|Refbegin|Refend|notes?list|notes|cit)*\}\}\.?|\<references\s*/\>|\s*\<ref +name+/\>|\s*\<ref +name+\>*?\</\s*ref\>|\s*\</\s*ref\>|\s*\}\}|\s*\<\!\-\-\s*(?!\{\{(?:Wik|Commons))*?\-\-\>|\s*?+*\*+)+)(\s*=+\s*See also\s*=*(?:(?:\s*\{\{(?:Portal|C?Commons|C ?cat|cc(?=\s*)|Wik|(?:col *div|colbegin|cols|div *2col|div *col *begin|div *col *start|div*col|divbegin|divided *column)*\}\}+\{\{\s*(?:col * div *end|col *end|div*col*end|div *end|end *div *col)|Columns\-list)*\}\})*))((?:\s*\*(?:\s*\{\{\s*cite+\}\}|+))*) Replace with: $3$4 $1$2

with "Regular expression" checkbox checked, the others unchecked, "Apply No. of times" = 1, and nothing in the "If" tab, took an average of 64.75s to run over 4 runs (66, 65, 64, 64s).

The following C# module code, however, has been running (hanging), for over 30 minutes:

public string ProcessArticle(string ArticleText, string ArticleTitle, int wikiNamespace, out string Summary, out bool Skip) { Skip = false; Summary = "Summary"; string regex = @"(\=+\s*(?:(?:Foot)?Notes|Further reading)\s*\=+)((?:\s*\*?\s*\{\{\s*(?:Wik|Commons|Reflist|Refbegin|Refend|notes?list|notes|cit)*\}\}\.?|\<references\s*/\>|\s*\<ref +name+/\>|\s*\<ref +name+\>*?\</\s*ref\>|\s*\</\s*ref\>|\s*\}\}|\s*\<\!\-\-\s*(?!\{\{(?:Wik|Commons))*?\-\-\>|\s*?+*\*+)+)(\s*=+\s*See also\s*=*(?:(?:\s*\{\{(?:Portal|C?Commons|C ?cat|cc(?=\s*)|Wik|(?:col *div|colbegin|cols|div *2col|div *col *begin|div *col *start|div*col|divbegin|divided *column)*\}\}+\{\{\s*(?:col * div *end|col *end|div*col*end|div *end|end *div *col)|Columns\-list)*\}\})*))((?:\s*\*(?:\s*\{\{\s*cite+\}\}|+))*)"; ArticleText = Regex.Replace(ArticleText, regex, @"$3$4" + "\n\n" + @"$1$2", RegexOptions.IgnoreCase); return ArticleText; }

There are no @, ", ; characters in the regex that need to be escaped, and "Skip if no changes are made" was checked for both runs.

Does anyone know why this is?   ~ Tom.Reding (talkdgaf)  17:44, 24 October 2023 (UTC)Reply

For the record, I can reproduce this result: on my Surface 7, 46 seconds for the find/replace method, and still hanging after 3 minutes for the module code. But the C# method took me 44 seconds in a code snippet independent of any AWB context so, as you probably suspect, there's something odd in the way the module is processed. David Brooks (talk) 18:52, 26 October 2023 (UTC)Reply @Reedy: given what DavidBrooks said, is this a feature or a known/fixed bug (i.e. should I create a phab ticket for this)?   ~ Tom.Reding (talkdgaf)  16:43, 27 October 2023 (UTC)Reply Well, I ran it under the debugger and now I'm even more confused. First, the debugger (apparently) decompiles the module code and it turns out it's been optimized (e.g. the last two lines are coalesced, and the @"" version appears as a regular string with escaped \'s). Your version hangs on the assignment of string regex, not on executing the Regex.Replace. Hm, is it too long for either the compiler or the framework? So I chunked the long string and used concatenated literals... and now the string assignment goes through but the regex replace call now hangs. Using String.Concat is optimized to the same thing. Using StringBuilder to join the chunks also hangs in the conversion to a string. Creating a Regex object from the long string doesn't help. Not a solution to your problem, I'm afraid, but just more puzzles. Maybe it's a C# 3.5 thing, but the decompiled code looks correct. BTW it's my local build of AWB using Framework 4.8.1 (so it's not a 4.5 problem). David Brooks (talk) 20:58, 27 October 2023 (UTC)Reply For those who, like me, found the above conclusion barely credible, I dug a little deeper into the low level code. Turns out that the compiler optimizes out the assignment to ArticleText, but the JITter optimizes out the assignment to the regex string and drops the string directly into the Replace call, which of course contains the hang. It looks like the VS debugger isn't too good at following run-time compiled code. So now I'm beginning to suspect that the fault lies in the version of the assembly (System.Text.RegularExpressions.dll) that contains the Regex class. It's possible, I suppose, that the compiled code binds to an older version of the Framework and that is responsible for the hang, while the find/replace version uses the runtime (Fx 4.5) built into AWB, but here we're at about the limit of where I can figure out runtime CLR bindage. In any case, there may not be a ready solution that AWB could implement. BTW, I did try hacking the source to use v4 of the language, but that didn't help. David Brooks (talk) 14:37, 29 October 2023 (UTC)Reply

If a regular expression takes more than a couple of seconds to run on wp-article lengths of text then it will be due to catastrophic batcktracking. That's not an issue with AWB or C#, it is a fundamental limitation of how regular expressions work. Backtracking can sometimes be resolved in 10s of seconds or minutes, but it could take years on a sufficiently long input string (as it's an exponential issue). I can't really make sense of the large regex expression given, what I'd suggest to do is separate it into smaller parts and identify which clause or clauses are backtracking, then see if you can adjust them to avoid the issue.

If you are able to write a module you will probably find it is faster to find candidate text with simple regexes, then do your negative checks/exclusions on only those strings of text matched, and proceed to replace if no exclusions found i.e. breaking things down rather than one very large find/replace with lookaheads etc. That way any backtracking is limited to a very short string not the whole text of a wp article etc. Rjwilmsi 18:23, 29 October 2023 (UTC)Reply

Yes, but if it is timing out due to backtracking, wouldn't that also apply to the identical RE presented in the Advanced Find/Replace dialog? That does finish in under a minute for Tom and me. BTW, on a rainy Sunday I managed to hack AWB so that the run-time compile would use the same compiler (and System.dll) as I used to build the executable itself, in case there was some inconsistency in the details of string management, but no help. David Brooks (talk) 20:06, 29 October 2023 (UTC)Reply

Well, I noodled on this and found a fix. But (a) It's a source code fix; I haven't yet figured out whether the RE can be tweaked to compensate (b) I have no idea why it makes a difference (c) I have no idea if it would introduce regressions. Code in T350636.

tl;dr: during page pre-processing, AWB normalizes line endings from \r\n to \n before running the rule, but not before running the module (which comes first). Making that normalization happen before running the module restores the expected 40-50 second runtime. David Brooks (talk) 22:57, 6 November 2023 (UTC)Reply

Vital and bannershell

Tracked in Phabricator
Task T330170
Resolved

Please see this discussion with User:Primefac about a problem with the placement of WP:VITAL. It was concluded long ago at WP:TALKLEAD that Vital Wikproject is included in the banner shell. SandyGeorgia (Talk) 11:55, 1 November 2023 (UTC)Reply

@SandyGeorgia: Thanks for the update - I've reopened the AWB request in Phabricator. GoingBatty (talk) 14:04, 1 November 2023 (UTC)Reply Thanks! SandyGeorgia (Talk) 14:08, 1 November 2023 (UTC)Reply @SandyGeorgia: This request has been kindly resolved by Rjwilmsi. Now we just need a new version of AWB. GoingBatty (talk) 18:33, 1 November 2023 (UTC)Reply Thanks all (even if it's Greek to me :) SandyGeorgia (Talk) 18:37, 1 November 2023 (UTC)Reply I could use the new AWB release, as a while back, I had done some cleanup of talk page banners with this incorrect moving out of vital from the banner shell. Any word on when the new release is coming out? Stefen Towers among the rest! GabGruntwerk 18:41, 13 November 2023 (UTC)Reply Last version was released 2021, so don't hold your breath. --Trialpears (talk) 21:14, 13 November 2023 (UTC)Reply Fully aware of that as I'm a long-time user of AWB, but there had been rumblings about a new release recently. Stefen Towers among the rest! GabGruntwerk 21:17, 13 November 2023 (UTC)Reply

Provinces of Italy

Could you, kindly, use a bot to make all "Provinces of..." into "provinces of..." (with a lowercase initial)? It is correct in lowercase, almost all sources, whether in Italian, French or other languages write "provinces" with a lowercase initial. To make it clearer, as in the "Province of Pordenone" page and not as in the "Province of Rovigo" page. Thanks in advance. JackkBrown (talk) 14:44, 7 November 2023 (UTC)Reply

@JackkBrown: Already posted at Wikipedia:AutoWikiBrowser/Tasks#Provinces of Italy. Let's keep the discussion there. GoingBatty (talk) 15:06, 7 November 2023 (UTC)Reply

Need help with AWB

Hi, on this page, Wikipedia:Vital_articles/Level/5/History, I am trying to put the events prior to 1945 in the late modern section and the events after 1945 in the Contemporary section. I was hoping that someone can provide me with step-by-step instructions on how I can sort the events using AWB. Thank you. Interstellarity (talk) 14:20, 8 November 2023 (UTC)Reply

I'm not sure AWB is the best tool (outside of generic section ordering) for chronologically re-sorting or similarly rearranging content on a page. Stefen Towers among the rest! GabGruntwerk 20:48, 8 November 2023 (UTC)Reply @StefenTower What would be a better tool to use besides AWB? Interstellarity (talk) 00:50, 9 November 2023 (UTC)Reply Manual labor. :) Seriously, this is just one page being edited, and sometimes editing can be tedious. There's not always a tool to help us. Stefen Towers among the rest! GabGruntwerk 00:58, 9 November 2023 (UTC)Reply Maybe Excel. Neils51 (talk) 02:51, 9 November 2023 (UTC)Reply This does not look like a job for AWB. The information is simply not on the page: good luck guessing which section Great Recession goes in without reading the article. Perhaps you could split Modern and Contemporary into separate sandboxes, use a tool such as PetScan or Quarry to see which pages linked from each sandbox have a Category: containing four digits that are a year in the wrong era (beware: "Category:1940s whatever" may be either) and move them by hand. Theoretically, a complex AWB module might be able to do this; in practice even the ablest programmer could do it much quicker manually. Certes (talk) 09:48, 9 November 2023 (UTC)Reply

Auto saving changes to multiple pages

Hi. I'm trying to edit some repetitive text out of some files on Wikimedia Commons. Unfortunately there's thousands of instances of it. So I'm going through individual folders to edit each file which I don't necessarily have a problem with, but if I load all the files into the pages list and click save it only edits a single file at a time. So is there a way to batch save the edit to all the files in the list without having to click save thousands of times? Otherwise I'm going to have to click save 65,000 time, which I rather not do if I can just do all the edits at once. Thanks. Adamant1 (talk) 03:01, 10 November 2023 (UTC)Reply

@Adamant1: Hi there! You could consider creating a bot. Instead of clicking save each time, you'd load the list and the bot would click save once every 10 seconds based on the rules you set up in AWB. GoingBatty (talk) 05:11, 10 November 2023 (UTC)Reply I'll have to look into that. I'm not really up on how to create bots but it's better then nothing. --Adamant1 (talk) 07:09, 10 November 2023 (UTC)Reply There are people with bots who will often accept requests, handle the coding, approval process, testing, etc., if you have a well-formed proposal such as a tested AWB setup. Go to WP:BOTREQ. Dicklyon (talk) 19:13, 10 November 2023 (UTC)Reply

Hatnote error

Tracked in Phabricator
Task T293603
Resolved

Apparently, in this edit, AWB moved the "hatnote group" shell below the actual hatnotes (rather than surrounding them), leaving an error message on the page. Can this be fixed to avoid repetition? BD2412 T 01:31, 14 November 2023 (UTC)Reply

@BD2412: see above @ #GENFIX error - it's fixed in the sandbox, but still pending a version update.   ~ Tom.Reding (talkdgaf)  02:00, 14 November 2023 (UTC)Reply Understood, thanks. BD2412 T 02:08, 14 November 2023 (UTC)Reply

Question

Why should editors request permission for AutoWikiBrowser? Should the application be open without registering? Toadette (let's chat together) 11:33, 17 November 2023 (UTC)Reply

@ToadetteEdit: Using AWB makes it easy to vandalize a large number of pages very quickly. Requesting permission gives the admins the ability to look at a user's contribution history and confirm they're here to build an encyclopedia before granting access. GoingBatty (talk) 16:52, 17 November 2023 (UTC)Reply

Unofficial release

Several people have been asking for a release of the latest builds. If you trust me <insert snarky comment here>, I've thrown a build of the latest release - revision 12554, dated Nov 1 - up on github. Go to https://github.com/DavidWBrooks/UnofficialAWB/releases/latest, and click AutoWikiBrowser6211.zip. You can then follow the installation instructions from Wikipedia:AutoWikiBrowser#(2) Download. As they say, it works for me. David Brooks (talk) 23:10, 17 November 2023 (UTC)Reply

@DavidBrooks: Works well for me - thanks so much!!! GoingBatty (talk) 03:08, 18 November 2023 (UTC)Reply Working for me as well. Thank you! Stefen Towers among the rest! GabGruntwerk 06:21, 18 November 2023 (UTC)Reply