This is the discussion page for the AWB project. It is also the place to discuss using the AWB program itself (if you need help, or have a question about AWB, etc.). Where to make specific types of reports or requests is explained in the Before you post section below. Before asking questions, please read the Frequently asked questions below.
![]() | Please click here to start a new discussion. |
![]() Archives |
Index 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 |
Sections older than 30 days may be automatically archived by lowercase sigmabot III. |
Do you want to ... | Please use | ||||
---|---|---|---|---|---|
Report a bug or request a feature in AWB? | Check reported tasks before filing a new task. You do not need to create another account there; just log in with your normal Wikimedia account. See this MediaWiki wiki page on how to report bugs and request features on Phabricator.
| ||||
Report an incorrectly fixed typo? | Wikipedia talk:AutoWikiBrowser/Typos | ||||
Request approval to use AWB? | Wikipedia:Requests for permissions/AutoWikiBrowser | ||||
Ask a question about AWB or ask for help? | This page |
Frequently asked questions |
---|
|
In this diff, AWB's GENFIX set messed up an implementation of {{hatnote group}}. Could this be fixed to resolve the error? {{u|Sdkb}} talk 04:38, 5 October 2023 (UTC)Reply
I have run into that error as well. What I saw was when AWB seeks to replace a redirect to the template, it ungroups the contents and places the {{hatnote group}} template separately beneath what it had previously grouped. Stefen Towers among the rest! Gab • Gruntwerk 04:46, 5 October 2023 (UTC)Reply It was logged as a bug a couple of years ago. -- John of Reading (talk) 06:52, 5 October 2023 (UTC)Reply @Sdkb, StefenTower, and John of Reading: I received an email this morning that Rjwilmsi has fixed this issue. @Rjwilmsi: What are the plans to release an updated version of AWB with this fix (and hopefully resolve a few more bugs beforehand)? Thanks! You would need to arrange with Reedy if you think a new AWB release is worthwhile. Rjwilmsi 17:55, 5 October 2023 (UTC)Reply I find it weird that AWB releases seem to be done in giant versions, rather than small updates automatically pushed out. The latter seems the more modern approach. {{u|Sdkb}} talk 18:00, 5 October 2023 (UTC)Reply @Reedy: Could we please have an updated version of AWB soon (hopefully with a few more resolved bugs)? Thanks! GoingBatty (talk) 05:27, 6 October 2023 (UTC)Reply @Rjwilmsi: Is Reedy the only one who can release a new version of AWB? Reedy hasn't been very active here lately. GoingBatty (talk) 22:13, 27 October 2023 (UTC)Reply Effectively yes. I can do local builds but on my setup (MonoDevelop/Linux) I can't do a full clean build as Reedy has updated the AWB solution to use C# reference libraries etc. that MonoDevelop can't (yet) handle. Also the AWB release process requires changes to admin-protected pages to update release versions. If Reedy doesn't respond then I suppose I'll have to get Visual Studio set up on a spare Windows machine so I can do a full build and then hopefully we can find another admin to get the AWB version page updated. Rjwilmsi 18:27, 29 October 2023 (UTC)Reply This certainly points to a systemic issue. AWB ideally should be converted to an online tool (rather than a program you have to download) that can be updated constantly whenever there is a fix needed. {{u|Sdkb}} talk 19:21, 29 October 2023 (UTC)Reply @Rjwilmsi: Reedy (talk · contribs) hasn't been online since 1 October, and not very often all year. I reached out to Reedy on #AutoWikiBrowser connect, but didn't get a response, so any help you could provide would be appreciated. GoingBatty (talk) 03:30, 14 November 2023 (UTC)Reply It would be nice if we can figure out a way to share build duties at least, and have more regular releases. I'm a former software developer and would like to see if I can build it, but only if all the required tools are free. That is, is Visual Studio Community enough for the task? Stefen Towers among the rest! Gab • Gruntwerk 00:41, 18 November 2023 (UTC)Reply I'm also a software developer, but Java rather than C#. Regardless, I'm trying to follow the instructions here. After getting through Microsoft's gross privacy-invading processes to download an old version of Visual Studio, I now have it installed. (It was by comparison much easier to install TortoiseSVN.) I've now started VS and opened the AWB project, but I don't know what this instruction means - "When the IDE has loaded, select release rather than debug (next to the green forward arrow).". If anyone can enlighten me, that would be much appreciated. Cheers, Kiwipete (talk) 02:48, 18 November 2023 (UTC)ReplyI decided to compare the speed of a find-and-replace rule with the identical rule in C#, both run on German Empire, thinking C# would be somewhat faster. I've found the exact opposite, however.
The following find-and-replace rule:
Find: (\=+\s*(?:(?:Foot)?Notes|Further reading)\s*\=+)((?:\s*\*?\s*\{\{\s*(?:Wik|Commons|Reflist|Refbegin|Refend|notes?list|notes|cit)*\}\}\.?|\<references\s*/\>|\s*\<ref +name+/\>|\s*\<ref +name+\>*?\</\s*ref\>|\s*\</\s*ref\>|\s*\}\}|\s*\<\!\-\-\s*(?!\{\{(?:Wik|Commons))*?\-\-\>|\s*?+*\*+)+)(\s*=+\s*See also\s*=*(?:(?:\s*\{\{(?:Portal|C?Commons|C ?cat|cc(?=\s*)|Wik|(?:col *div|colbegin|cols|div *2col|div *col *begin|div *col *start|div*col|divbegin|divided *column)*\}\}+\{\{\s*(?:col * div *end|col *end|div*col*end|div *end|end *div *col)|Columns\-list)*\}\})*))((?:\s*\*(?:\s*\{\{\s*cite+\}\}|+))*) Replace with: $3$4 $1$2with "Regular expression" checkbox checked, the others unchecked, "Apply No. of times" = 1, and nothing in the "If" tab, took an average of 64.75s to run over 4 runs (66, 65, 64, 64s).
The following C# module code, however, has been running (hanging), for over 30 minutes:
public string ProcessArticle(string ArticleText, string ArticleTitle, int wikiNamespace, out string Summary, out bool Skip) { Skip = false; Summary = "Summary"; string regex = @"(\=+\s*(?:(?:Foot)?Notes|Further reading)\s*\=+)((?:\s*\*?\s*\{\{\s*(?:Wik|Commons|Reflist|Refbegin|Refend|notes?list|notes|cit)*\}\}\.?|\<references\s*/\>|\s*\<ref +name+/\>|\s*\<ref +name+\>*?\</\s*ref\>|\s*\</\s*ref\>|\s*\}\}|\s*\<\!\-\-\s*(?!\{\{(?:Wik|Commons))*?\-\-\>|\s*?+*\*+)+)(\s*=+\s*See also\s*=*(?:(?:\s*\{\{(?:Portal|C?Commons|C ?cat|cc(?=\s*)|Wik|(?:col *div|colbegin|cols|div *2col|div *col *begin|div *col *start|div*col|divbegin|divided *column)*\}\}+\{\{\s*(?:col * div *end|col *end|div*col*end|div *end|end *div *col)|Columns\-list)*\}\})*))((?:\s*\*(?:\s*\{\{\s*cite+\}\}|+))*)"; ArticleText = Regex.Replace(ArticleText, regex, @"$3$4" + "\n\n" + @"$1$2", RegexOptions.IgnoreCase); return ArticleText; }There are no @, ", ; characters in the regex that need to be escaped, and "Skip if no changes are made" was checked for both runs.
Does anyone know why this is? ~ Tom.Reding (talk ⋅dgaf) 17:44, 24 October 2023 (UTC)Reply
For the record, I can reproduce this result: on my Surface 7, 46 seconds for the find/replace method, and still hanging after 3 minutes for the module code. But the C# method took me 44 seconds in a code snippet independent of any AWB context so, as you probably suspect, there's something odd in the way the module is processed. David Brooks (talk) 18:52, 26 October 2023 (UTC)Reply @Reedy: given what DavidBrooks said, is this a feature or a known/fixed bug (i.e. should I create a phab ticket for this)? ~ Tom.Reding (talk ⋅dgaf) 16:43, 27 October 2023 (UTC)Reply Well, I ran it under the debugger and now I'm even more confused. First, the debugger (apparently) decompiles the module code and it turns out it's been optimized (e.g. the last two lines are coalesced, and the @"" version appears as a regular string with escaped \'s). Your version hangs on the assignment of string regex, not on executing the Regex.Replace. Hm, is it too long for either the compiler or the framework? So I chunked the long string and used concatenated literals... and now the string assignment goes through but the regex replace call now hangs. Using String.Concat is optimized to the same thing. Using StringBuilder to join the chunks also hangs in the conversion to a string. Creating a Regex object from the long string doesn't help. Not a solution to your problem, I'm afraid, but just more puzzles. Maybe it's a C# 3.5 thing, but the decompiled code looks correct. BTW it's my local build of AWB using Framework 4.8.1 (so it's not a 4.5 problem). David Brooks (talk) 20:58, 27 October 2023 (UTC)Reply For those who, like me, found the above conclusion barely credible, I dug a little deeper into the low level code. Turns out that the compiler optimizes out the assignment to ArticleText, but the JITter optimizes out the assignment to the regex string and drops the string directly into the Replace call, which of course contains the hang. It looks like the VS debugger isn't too good at following run-time compiled code. So now I'm beginning to suspect that the fault lies in the version of the assembly (System.Text.RegularExpressions.dll) that contains the Regex class. It's possible, I suppose, that the compiled code binds to an older version of the Framework and that is responsible for the hang, while the find/replace version uses the runtime (Fx 4.5) built into AWB, but here we're at about the limit of where I can figure out runtime CLR bindage. In any case, there may not be a ready solution that AWB could implement. BTW, I did try hacking the source to use v4 of the language, but that didn't help. David Brooks (talk) 14:37, 29 October 2023 (UTC)ReplyIf a regular expression takes more than a couple of seconds to run on wp-article lengths of text then it will be due to catastrophic batcktracking. That's not an issue with AWB or C#, it is a fundamental limitation of how regular expressions work. Backtracking can sometimes be resolved in 10s of seconds or minutes, but it could take years on a sufficiently long input string (as it's an exponential issue). I can't really make sense of the large regex expression given, what I'd suggest to do is separate it into smaller parts and identify which clause or clauses are backtracking, then see if you can adjust them to avoid the issue.
If you are able to write a module you will probably find it is faster to find candidate text with simple regexes, then do your negative checks/exclusions on only those strings of text matched, and proceed to replace if no exclusions found i.e. breaking things down rather than one very large find/replace with lookaheads etc. That way any backtracking is limited to a very short string not the whole text of a wp article etc. Rjwilmsi 18:23, 29 October 2023 (UTC)Reply
Yes, but if it is timing out due to backtracking, wouldn't that also apply to the identical RE presented in the Advanced Find/Replace dialog? That does finish in under a minute for Tom and me. BTW, on a rainy Sunday I managed to hack AWB so that the run-time compile would use the same compiler (and System.dll) as I used to build the executable itself, in case there was some inconsistency in the details of string management, but no help. David Brooks (talk) 20:06, 29 October 2023 (UTC)ReplyWell, I noodled on this and found a fix. But (a) It's a source code fix; I haven't yet figured out whether the RE can be tweaked to compensate (b) I have no idea why it makes a difference (c) I have no idea if it would introduce regressions. Code in T350636.
tl;dr: during page pre-processing, AWB normalizes line endings from \r\n to \n before running the rule, but not before running the module (which comes first). Making that normalization happen before running the module restores the expected 40-50 second runtime. David Brooks (talk) 22:57, 6 November 2023 (UTC)Reply
Please see this discussion with User:Primefac about a problem with the placement of WP:VITAL. It was concluded long ago at WP:TALKLEAD that Vital Wikproject is included in the banner shell. SandyGeorgia (Talk) 11:55, 1 November 2023 (UTC)Reply
@SandyGeorgia: Thanks for the update - I've reopened the AWB request in Phabricator. GoingBatty (talk) 14:04, 1 November 2023 (UTC)Reply Thanks! SandyGeorgia (Talk) 14:08, 1 November 2023 (UTC)Reply @SandyGeorgia: This request has been kindly resolved by Rjwilmsi. Now we just need a new version of AWB. GoingBatty (talk) 18:33, 1 November 2023 (UTC)Reply Thanks all (even if it's Greek to me :) SandyGeorgia (Talk) 18:37, 1 November 2023 (UTC)Reply I could use the new AWB release, as a while back, I had done some cleanup of talk page banners with this incorrect moving out of vital from the banner shell. Any word on when the new release is coming out? Stefen Towers among the rest! Gab • Gruntwerk 18:41, 13 November 2023 (UTC)Reply Last version was released 2021, so don't hold your breath. --Trialpears (talk) 21:14, 13 November 2023 (UTC)Reply Fully aware of that as I'm a long-time user of AWB, but there had been rumblings about a new release recently. Stefen Towers among the rest! Gab • Gruntwerk 21:17, 13 November 2023 (UTC)ReplyCould you, kindly, use a bot to make all "Provinces of..." into "provinces of..." (with a lowercase initial)? It is correct in lowercase, almost all sources, whether in Italian, French or other languages write "provinces" with a lowercase initial. To make it clearer, as in the "Province of Pordenone" page and not as in the "Province of Rovigo" page. Thanks in advance. JackkBrown (talk) 14:44, 7 November 2023 (UTC)Reply
@JackkBrown: Already posted at Wikipedia:AutoWikiBrowser/Tasks#Provinces of Italy. Let's keep the discussion there. GoingBatty (talk) 15:06, 7 November 2023 (UTC)ReplyHi, on this page, Wikipedia:Vital_articles/Level/5/History, I am trying to put the events prior to 1945 in the late modern section and the events after 1945 in the Contemporary section. I was hoping that someone can provide me with step-by-step instructions on how I can sort the events using AWB. Thank you. Interstellarity (talk) 14:20, 8 November 2023 (UTC)Reply
I'm not sure AWB is the best tool (outside of generic section ordering) for chronologically re-sorting or similarly rearranging content on a page. Stefen Towers among the rest! Gab • Gruntwerk 20:48, 8 November 2023 (UTC)Reply @StefenTower What would be a better tool to use besides AWB? Interstellarity (talk) 00:50, 9 November 2023 (UTC)Reply Manual labor. :) Seriously, this is just one page being edited, and sometimes editing can be tedious. There's not always a tool to help us. Stefen Towers among the rest! Gab • Gruntwerk 00:58, 9 November 2023 (UTC)Reply Maybe Excel. Neils51 (talk) 02:51, 9 November 2023 (UTC)Reply This does not look like a job for AWB. The information is simply not on the page: good luck guessing which section Great Recession goes in without reading the article. Perhaps you could split Modern and Contemporary into separate sandboxes, use a tool such as PetScan or Quarry to see which pages linked from each sandbox have a Category: containing four digits that are a year in the wrong era (beware: "Category:1940s whatever" may be either) and move them by hand. Theoretically, a complex AWB module might be able to do this; in practice even the ablest programmer could do it much quicker manually. Certes (talk) 09:48, 9 November 2023 (UTC)ReplyHi. I'm trying to edit some repetitive text out of some files on Wikimedia Commons. Unfortunately there's thousands of instances of it. So I'm going through individual folders to edit each file which I don't necessarily have a problem with, but if I load all the files into the pages list and click save it only edits a single file at a time. So is there a way to batch save the edit to all the files in the list without having to click save thousands of times? Otherwise I'm going to have to click save 65,000 time, which I rather not do if I can just do all the edits at once. Thanks. Adamant1 (talk) 03:01, 10 November 2023 (UTC)Reply
@Adamant1: Hi there! You could consider creating a bot. Instead of clicking save each time, you'd load the list and the bot would click save once every 10 seconds based on the rules you set up in AWB. GoingBatty (talk) 05:11, 10 November 2023 (UTC)Reply I'll have to look into that. I'm not really up on how to create bots but it's better then nothing. --Adamant1 (talk) 07:09, 10 November 2023 (UTC)Reply There are people with bots who will often accept requests, handle the coding, approval process, testing, etc., if you have a well-formed proposal such as a tested AWB setup. Go to WP:BOTREQ. Dicklyon (talk) 19:13, 10 November 2023 (UTC)ReplyApparently, in this edit, AWB moved the "hatnote group" shell below the actual hatnotes (rather than surrounding them), leaving an error message on the page. Can this be fixed to avoid repetition? BD2412 T 01:31, 14 November 2023 (UTC)Reply
@BD2412: see above @ #GENFIX error - it's fixed in the sandbox, but still pending a version update. ~ Tom.Reding (talk ⋅dgaf) 02:00, 14 November 2023 (UTC)Reply Understood, thanks. BD2412 T 02:08, 14 November 2023 (UTC)ReplyWhy should editors request permission for AutoWikiBrowser? Should the application be open without registering? Toadette (let's chat together) 11:33, 17 November 2023 (UTC)Reply
@ToadetteEdit: Using AWB makes it easy to vandalize a large number of pages very quickly. Requesting permission gives the admins the ability to look at a user's contribution history and confirm they're here to build an encyclopedia before granting access. GoingBatty (talk) 16:52, 17 November 2023 (UTC)ReplySeveral people have been asking for a release of the latest builds. If you trust me <insert snarky comment here>, I've thrown a build of the latest release - revision 12554, dated Nov 1 - up on github. Go to https://github.com/DavidWBrooks/UnofficialAWB/releases/latest, and click AutoWikiBrowser6211.zip. You can then follow the installation instructions from Wikipedia:AutoWikiBrowser#(2) Download. As they say, it works for me. David Brooks (talk) 23:10, 17 November 2023 (UTC)Reply
@DavidBrooks: Works well for me - thanks so much!!! GoingBatty (talk) 03:08, 18 November 2023 (UTC)Reply Working for me as well. Thank you! Stefen Towers among the rest! Gab • Gruntwerk 06:21, 18 November 2023 (UTC)Reply