Who Really Wrote the Works of the British Renaissance?

Questa conversazione è stata continuata da Who Really Wrote the Works of the British Renaissance? thread 2.

ConversazioniTalk about LibraryThing

Iscriviti a LibraryThing per pubblicare un messaggio.

1AbigailAdams26
Modificato: Dic 3, 2021, 9:44 pm

Hi All: I want to draw your attention to a new interview we've just posted on the LibraryThing blog, with scholar Anna Faktorovich, in which we discuss her new project, the British Renaissance Re-Attribution and Modernization Series.

ER participants will recognize this name, as the author's main study, as well as some of the newly available texts from the period that she has presented in the series, were offered as giveaways during this past month's ER batch.

Dr. Faktorovich discusses her new computational-linguistic model, and how she has used it in the study of 284 works of the British Renaissance, coming to the conclusion that these works were written by the same six ghostwriters.

Come read her argument, and tell us what you think!

https://blog.librarything.com/2021/12/an-interview-with-scholar-anna-faktorovich...

2thorold
Dic 4, 2021, 6:43 am

For example, the data indicates only two linguistic signatures between the three Bronte sisters, suggesting it is likely the initial assignment of these texts to only two male brothers was more accurate than the current belief three women wrote them.

Hmmm. I think that's the point where you would normally discard the method as unreliable and try something else.

3Nicole_VanK
Dic 4, 2021, 7:51 am

It's "interesting", but I'm inclined to to agree with >2 thorold: that this is dubious methodology.

4MarthaJeanne
Dic 4, 2021, 8:45 am

Amazing what you can 'prove' if you play around with numbers long enough.

5curiousstr.eam7
Dic 4, 2021, 8:46 am

Questo utente è stato eliminato perché considerato spam.

6Crypto-Willobie
Dic 4, 2021, 9:10 am

What a load of bollocks.

7aspirit
Modificato: Dic 4, 2021, 9:32 am

>1 AbigailAdams26: I think LibraryThing's spotlighting the argument without providing counter arguments is embarrassing and a bit threatening, considering the risks to us if responses boost the interview.

8lilithcat
Dic 4, 2021, 9:36 am

>7 aspirit:

This author also spammed the forums, which is another reason not to highlight her work. Such behavior should not be rewarded.

9DuncanHill
Dic 4, 2021, 10:37 am

"the “Shakespeare” plays and poetry translated into Modern English registered as a separate linguistic signature from these same texts in their original spelling" = "My versions of Shakespeare read like they were written by me". The idea that the style of a translation can be used to identify the original author is so bizarre it barely qualifies as an idea.

"I learned about the £2,400 loan William and his brother Henry Percy (Earl of Northumberland) took out from Arthur Medleycote (London merchant tailor) in 1593, just before the granting of the theater duopoly by Elizabeth I in 1594. This documented proof, without any corresponding record of what else William could have spent this sum on, firmly establishes that William re-invested this sum in troupe-development and theater-building in London" - no, it establishes nothing of the sort. It only establishes that you don't know what he did with the money. Given Percy was imprisoned for debt more than once, and we know that his brother paid for his food in prison, a more likely use would be to satisfy his latest creditors.

"As I mentioned, none of William Percy’s plays or poetry, or the plays I am re-attributing to him in this part of the series, have ever been translated into accessible Modern English before" - Rowse years ago did his "Contemporary Shakespeare" which while not entirely modernising the language does make it eminently accessible.

I think the claim for Percy as Shakespeare is little more than the usual snobbish refusal to accept that a bright middle-class boy from an ambitious family who went to a decent provincial school could go on to make a success of himself, and so to claim an aristocratic candidate instead. It's an attitude that's been around for years and seems to me to shew a remarkable ignorance about Elizabethan society.

As for the Brontës being two brothers, we really are ascending to Nephelokokkygia.

I really don't think LibraryThing should be promoting this.

10anglemark
Dic 4, 2021, 10:58 am

This brings to mind how the science fiction magazine Astounding went down after the editor stated to push Dianetics and psionics.

11Petroglyph
Modificato: Dic 4, 2021, 11:13 am

>1 AbigailAdams26:
This is the literary equivalent of those linguistic conspiracy theorists who derive all languages from Hebrew, or who claim that Serbian / Sanskrit / Tamil are the oldest language and that all others derive from them. There's a few nationalistic Hungarian foot soldiers out there connecting the language of Linear A to Hungarian.

Faktorovich's arguments run along similar lines: the establishment stinks; she's uncovered a massive cover-up or at least some sort of centuries-long blind spot; her ideas have extremely wide-ranging implications while vindicating a downtrodden minority; she ought to be recognized as a revolutionary in the field; Great Scott, the lone maverick's done it again!

At several points during this interview she says that for anyone to even criticize her arguments, they'd have to read, work through and refute several other of her books. ("You really have to read Volumes 1-2 to understand how overwhelming the evidence is for these conclusions, as reading this summary alone could not possibly convince anybody that the history with which they are familiar is entirely incorrect"). Right. In reality, her methodology is faulty and the whole thing is built on the bane of much computationally-supported research: GIGO.

Digital humanities is a fairly young field, and there's no doubt that applying Big Data and computational methods like stylometry will allow us to answer questions we can't without those methods. Callaway's fun experiment in untangling the two authors of Good Omens is a good example of that. Faktorovich's work, in contrast, looks, walks and talks like quackery.

ETA: I don't think this is the sort of thing that LT should be highlighting. Yikes!

12MarthaJeanne
Dic 4, 2021, 11:15 am

I find it disappointing that if she is reading an average of at least 6 books a day, she can't even name one or two of them.

13amanda4242
Modificato: Dic 4, 2021, 11:35 am

Wow, this is all kinds of wrong. Why is LT promoting this person's dodgy work?

14Crypto-Willobie
Dic 4, 2021, 11:36 am

Just at the common sense level --- can you really believe that all the works of the English renaissance were written by just six people?????

15abbottthomas
Dic 4, 2021, 11:47 am

Where is proximity1 when we need him?

162wonderY
Dic 4, 2021, 11:47 am

>12 MarthaJeanne: Yeah, that question went way off course! I had at some point expected to look up her analysis of genre literature; but seeing where she goes with other topics, it’s probably quite worthless.

17norabelle414
Dic 4, 2021, 12:04 pm

I agree with everyone else. It's one thing to interview an author with questions like "what challenges do you have when self-publishing scholarly non-fiction?" and another thing entirely to promote self-published claims as if they have inherent scientific merit without any kind of corroboration or peer review.

18anglemark
Dic 4, 2021, 12:04 pm

>15 abbottthomas: :D

19faktorovich
Dic 4, 2021, 1:03 pm

>2 thorold: Here is an excerpt from my forthcoming article on this topic that explains this point in full:

The contracted phrase "I can’t" only appears in “Anne Bronte’s” Agnes and in “Emily Bronte’s” Wuthering, confirming that the signature behind Wuthering is at least partially “Anne’s”… "I did not" appears among the top-6 phrases in every one of the 5 tested “Bronte” texts (in 4 of these as the 1st or 2nd most frequent)—“Emily”, “Anne” and “Charlotte’s” —and only appears in one other text, oddly enough, in “Thoreau’s” Walden… Out of these anomalies, the least surprising should be the “Brontes” because their three eventually feminine bylines commenced under male pseudonyms of “Currer”, “Ellis” and “Acton Bell”. Only one of the sisters, Charlotte, had a basic education, and she is claimed to have taught her sisters how to write highbrow literature at home. The death of two of the Bronte sisters—Emily in 1848 and Anne in 1849—coincided with the re-attribution of the masculine “Bell” pseudonyms to the three feminine “Brontes”. While most modern critics claim the “Brontes’” fame was instantaneous and pre-dated the re-gendering, this re-marketing under feminine names has clearly been the main element that canonized the books attributed to them. My findings from the Renaissance indicate that borrowing the names of the recently deceased consistently succeeds in elevating the texts with these bylines to much greater heights in popularity than rival bylines of the healthy and living. Examples of this trend include “Christopher Marlowe” and “Philip Sidney”, both of whom only began having books released with their byline after their deaths. The “Brontes’” novel-publishing history begins in 1847 with the first printing of “Currer Bell’s” Jane Eyre by Smith, Elder. The claimed instantaneous popularity of Jane, inspired Thomas Coutley Newby to accept and to release a couple of months later the multi-volume set of “Ellis Bell’s” Wuthering and “Acton Bell’s” Agnes Grey; both were then re-printed under the “Emily” and “Anne” bylines and with “Charlotte” as the attributed editor with Newby’s press in 1850. Back in 1847, Newby’s first advertisements of Wuthering and Agnes claimed they were new novels by the same author as the author of Jane Eyre, while contradictorily specifying that Wuthering was by “Acton”, and not by “Ellis” (as he claimed on the 1847 title-page) or by “Currer” (as would have been consistent with the Jane’s-byline claim). And when Newby sold American copyrights, the first American edition of Wuthering carried the “Currer Bell” byline, again contradicting the 1847 British edition’s “Ellis” byline (Newman, B. (New York: Broadview Editions, 2007). Wuthering Heights, 33-4.). Arguably the first “Bronte” sister to receive a byline in this saga was “Charlotte Bronte” when her byline appeared just above “Currer Bell’s” in its second printing of Jane in December of 1847, but within the book her name only appears in association with the “Preface”, while even the dedication to Thackeray is from merely “The Author”. The 1849 edition of Shirley still carries the “Currer Bell, Author of Jane Eyre” byline; and the Leipzig, Smith and other 1850 editions of Jane still have “Currer Bell” in their bylines. The first mention of “Charlotte Bronte” as the author of Jane appears to be a strange bound set of two letters called “A memento of her friendship with Mrs. Elizabeth Gaskell her Biographer” dated 1851-3, and currently stored as a manuscript in the University of Leeds library. The numerous changes in these bylines have been dismissed as the outcome of women being censored from accessing publishers under their own names, and thus going to extraordinary length to confuse even their male pseudonyms to avoid being discovered as the underlying authors. But this gender-disguising subterfuge hypothesis is contradicted by the occasional public acceptance of the female bylines by the Bronte family, in parallel with the continuing mis-attributions to the wrong masculine “Bell” bylines of some of the still un-feminized texts. The publishing history of the “Bronte” novels is one of the most provable cases of ghostwriting, pseudonym-usage and mis-attribution that has not been seriously questioned by scholars because the simple re-gendering solution to this mystery fits with the propaganda of feminist progress for British women even in the nineteenth century, when they were still barred from enrolling into universities, from voting, and most other basic human rights./ With this publishing history in mind, we can now tackle the computational-linguistics attributions of the “Bronte” texts. “Unmasking” claims that “Charlotte Bronte’s” Professor matched both of her sisters. This assertion is false according to my data because The Professor is a strong 11-test match to “Charlotte’s” second tested novel, Jane Eyre, and it is a relative non-match at 6-9 tests to the other three tested “Bronte” texts. In my data, the main anomaly among the “Brontes” is not “Charlotte”, but instead “Emily’s” Wuthering Heights. Just as Newby had initially claimed that Wuthering was by the same author as Jane, there is an extremely strong similarity-match in 18-tests between these two novels, confirming that they share a single dominant author. Since the only other text that has been attributed to “Ellis Bell” (re-assigned to “Emily”) was the 1846 collection of poems that includes the tri-byline “Currer, Ellis, and Acton”; it logically follows that there are only two linguistic styles or two co-writers despite the triplicates of “Bronte” and “Bell” pseudonyms assigned to their texts. On my 28-tests, there is a clear set of matches between “Anne’s” Agnes and Tenant on 11-12 tests, and “Charlotte’s” Jane matches Professor on 11 tests. Wuthering appears to have been written by the linguistic-signatures of both “Anne” and “Charlotte” because it matches not only “Charlotte’s” Jane on 16-18 tests, but also “Anne’s” Tenant on 13 tests. It is possible that Charlotte and Anne Bronte were truly authors, but it is more likely that there were two male collaborating professional ghostwriters desperate to make a living, who were experimenting with multiple bylines until they struck fame with the “Brontes’” feminine names.

20faktorovich
Dic 4, 2021, 1:09 pm

>4 MarthaJeanne: Unlike all previous re-attribution studies of the British Renaissance, I provide all of the raw data (https://github.com/faktorovich/Attribution) and explain my exact attribution method not only in the book, but also in this interview. In the article about the "Brontes", I explain how previous computational-linguistic articles, such as Moshe Koppel, Jonathan Schler and Elisheva Bonchek-Dokow’s “Measuring Differentiability: Unmasking”, have indeed manipulated the numbers and demonstrate the statistical errors of omission, misreporting, and false-data. In contrast, my method has so many layers of different tests that it is near-impossible to manipulate or bias the resulting data. I would be delighted to test any mystery test you propose to prove the method works. You just can't cling to the original bylines being the correct answer, and instead be willing to consider that a ghostwriter, a pseudonym or the like might have been involved.

21faktorovich
Dic 4, 2021, 1:11 pm

>6 Crypto-Willobie: I do not understand this slang expression. Are you stating that I have an enormous load of testicles? Is this a negative because I am a woman and you are being ironic?

22faktorovich
Dic 4, 2021, 1:14 pm

>7 aspirit: You are all making what I would imagine are intended to be counter-arguments here - unless "bollocks" is indeed intended as a compliment. The counter-arguments have also been made across the previous 400 years of scholarly misattribution of these texts. I cannot imagine LibraryThing would want to publish a hit-piece that insults folks as embarrassments and suggests they are threatened by scholarly research. I was hoping for some rational discussion in response to this interview, but I guess scholarly discourse has been replaced with insults.

23faktorovich
Dic 4, 2021, 1:16 pm

>8 lilithcat: I did not spam the forum. I posted an invitation to discuss this topic in one place other than the Author Hobnob area, and my comment was immediately deleted by you, who also sent several messages to me explaining that you are glad you censored my ability to speak in public. If your deletion of my post was rational and it is no longer there, are you bringing it up to make it sound like I am doing something wrong to discredit me over nonsense when you have no rational basis or evidence to counter my arguments?

24paradoxosalpha
Modificato: Dic 4, 2021, 1:19 pm

Even if the content of everything from that quarter to date didn't already support it, I'd look at the unbroken dump of >19 faktorovich: (and rapid-fire subsequent posts) with a growing suspicion of crankery. Still, I guess someone who suffers from this sort of logorrhea might "reasonably" conclude that vast swaths of literature were created by a much smaller number of authors than are reflected in their historical attributions.

While ordinarily I'd like to see the sort of intellectual charity exhibited by >1 AbigailAdams26:, I think AF's record on LT should forfeit it.

25faktorovich
Dic 4, 2021, 1:40 pm

>9 DuncanHill: You do not understand my findings or what I said in the interview. First, other computational-linguistics scholars have claimed that the modernized "Lear" is by a different linguistic-author than the old-spelling "Leir". This point is not in contention. The statement you quote from me is saying that scholars have failed to recognize that erroneously comparing modernized with old-spelling versions has led to modernization being the reason these two and others suggest anonymous pre-"Shakespeare" versions of many of these plays were not by "Shakespeare". It is irrelevant if my versions of "Shakespeare" sound like me; in fact, I have made the same type of edits to all plays as those who have modernized "Shakespeare", so all of these Percy plays should now sound like Percy after modern editing, making "Hamlet" obviously recognizable as similar to Percy's self-attributed plays such as "Aphrodisia". I said the opposite of your claim that the style of a translation can be used to identify the original; in fact, I said that the original should never be compared for authorial-attribution with the modern version.

Here is a fragment from Volumes 1-2 where I explain the "food"-payment: The most common negative criticism levied by biographers against William Percy is that he had served time in debtor’s prison. In 1887, de Fonblanque appears to have become the first biographer to claim Percy spent time in Fleet Prison for debt without citing a supporting source. The debtor’s prison mention has been repeated in studies such as Gerald Brenan’s history of the Percys in 1902. Dodds imagines a scenario where William Percy would have been imprisoned after Henry was locked in the Tower, and creditors lost faith in William’s capacity for repayment without his brother’s backing. The piece of evidence that might have inspired de Fonblanque initial claim is the payment listed in the Accounts of the Earls of Northumberland in 1611/2 for £11 19s to “Mr. Bagwell the keeper of Oxford Castle for the charges of Mr. William Percy’s diet there.” Another £14.14s.4d. was charged for William’s “diet” in 1617-8. Dodds argues: “Oxford Castle was the town prison, and the most probable explanation of this entry is that William was imprisoned there for debt.” An alternative explanation for this expensive culinary bill is that William was working in an administrative capacity at this Castle, which also enclosed the administrative offices of this county. The Castle’s function changed to solely being a prison in the 18th century, before it was repurposed again into a museum in 1996. If William was imprisoned in this Castle, Henry’s bill would have included charges for the room, and other elements essential to make an extended stay bearable for a gentleman. Regardless of if this was an imprisonment or an employment, this payment confirms William resided in Oxford by 1611-2. Instead of considering William’s potential gainful employment at the Castle, biographers have been describing William’s time in Oxford as a retirement. Dodds traces this claim back to the largely fictitious or gossip-based account of Anthony Wood in Antiquities of the City of Oxford (1661-6?; 1889).

By "Rowse years ago did his 'Contemporary Shakespeare'" you seem to be referring to the Rowman series of semi-translations of "Shakespeare". What does this have to do with me creating the first ever translations of Percy's self-attributed plays or the anonymous plays. The only play this can apply to is the first quarto of "Hamlet" because the second quarto of "Hamlet" has been translated before, but as I explain in the book this first previously-untranslated "bad" quarto entirely changes the storyline.

It is absurd to make any re-attribution claim (for or against) based solely on the class of the author. My findings are that only six ghostwriters worked non-stop for extremely low payments across their lifetimes to publish this massive output of texts in the British Renaissance. Most of them did so because they were in desperate need for money, but Percy was acting to defend his family against the corruption that had executed/assassinated two previous Earls of Northumberland and also sent his brother (the new Earl) into the Tower. The problem with "Shakespeare" was not that he was poor, but rather that he was not a real person; "Shakespeare's" signatures are blatantly forged by Percy as they match the handwriting style in other linguistically Percy-authored texts.

LibraryThing exists to explain new relevant research with librarians. If all of their Renaissance shelves/books are misattributed, this is a cataloging problem that librarians have to be aware of. The rational approach to counter my arguments would be for you to actually read the full series first, as I am certain the overwhelming evidence in it would convince even the most staunch deniers.

26faktorovich
Dic 4, 2021, 1:48 pm

>10 anglemark: If you have to post a public threat against an editor suggesting they will be fired for sharing a research-study you disagree with; you really must have no rational support for your absurd position. Scientology is a religion based on imagined fiction; my research study is based on overwhelming evidence including not only computational-linguistic data, but also forensic accounting, handwriting analysis and various other types of proof. Previous re-attributions of this corpus have indeed been psychic or intuitive, but my findings are the exact opposite of that as I make my case entirely based on a quantitative analysis and documentary evidence. I included the steps involved in my attribution method in the interview. You can test it on any texts of your own to see if you will confirm that it works or not.

27paradoxosalpha
Dic 4, 2021, 1:53 pm

>26 faktorovich: (re: >10 anglemark:)

That wasn't a threat, it was an historical observation.

28lilithcat
Dic 4, 2021, 2:02 pm

>23 faktorovich:

>my comment was immediately deleted by you,

As I previously explained to you, I did not delete your comment. I have no power to do so. Your comment was flagged by multiple people, and, under the rules of the site, that resulted in its disappearance.

you censored my ability to speak in public

Once again, I have no power to do so.

Why you chose to violate the Terms of Service and then got upset when there were consequences to those violations, I cannot fathom. Had you just joined, perhaps other members would have cut you more slack. But you have been a member of this site for several years, and should have been familiar with, and respected, its norms.

29faktorovich
Dic 4, 2021, 2:09 pm

>11 Petroglyph: The Hebrew language had a significant early impact on many world languages. And Sanskrit has been acknowledged to be an ancient language that also influenced those that followed it. What does any of this have to do with my research?

You seem to have concluded ironically that the establishment is made up of idiots who have had a blind-spot for the British Renaissance as they have been propagating the greatness of "Shakespeare" via his pro-monarchy plays. Yes, ghostwriters are the downtrodden minority, who have received little credit, while achieving many of the accomplishments their sponsors have been applauded for. They are like the Sherpas who are hired to guide the incapable westerners to the top of literary mountains, only to remain a nameless title themselves.

Those who were refuting Einstein's theory of special relativity had to at least claim to have read Einstein's paper(s) on this subject; so why would anybody have a right to ridicule and discredit any researcher without at-minimum reading the research they are refuting? You have not even mentioned the computational-linguistic attribution steps I included in the interview to explain what specifically is wrong with this precise approach.

There are several typos in the Callaway article you cite. My Volumes 1-2 include a chapter where I examined the structural story arcs of all of the "Shakespeare" plays and proved that the Percy/Jonson division can be seen in their plots as well as in their linguistics. My explanation is far more sophisticating and logical than what Callaway does and it addresses the rules of storytelling in the literature field, instead of using NSC classifications that are more typically applied to the statistical study of cell behavior in biology. You seem to be claiming that Callaway is superior simply because scholars have agreed he is superior, and anybody who questions his superiority is not worthy of consideration. Instead of saying nonsense about how I walk, look or talk (which you cannot see given our distance and the textual method of communication), why don't you actually point out what specific elements of my findings you find to be quacking?

Do you think the sort of post you have added here should be highlighted by LibraryThing instead? Maybe something that jumps between anti-Semitism, nonsense and insults against a woman's looks?

30faktorovich
Dic 4, 2021, 2:11 pm

>12 MarthaJeanne: If you had read my response, you would see the link to my latest bunch of reviews I did for PLJ: https://anaphoraliterary.com/journals/plj/plj-excerpts/book-reviews-summer-2021. You can see the names of the books I am reading there. The problem is not that I cannot name them, but rather that you cannot be bothered to read their names.

31faktorovich
Dic 4, 2021, 2:13 pm

>13 amanda4242: "All kinds of wrong"? Are you referring to the insulting nonsense in these responses? There is nothing "dodgy" about my work. The problem is in your lack of respect for fellow humans' scholarly labor.

32faktorovich
Dic 4, 2021, 2:18 pm

>14 Crypto-Willobie: As I explain in the book, "Philip Henslowe's Diary" indicates that many plays were written in 2 weeks after commissioning. This would mean 26 plays per year, and 1,560 plays in 60 years that was the length of Percy's literary career. There were only around 400 plays published during these decades, so there is absolutely nothing unbelievable about my findings. What I find to be unbelievable is that anybody can believe a writer can write only a single play in a lifetime and then never publish another play again despite continuing to live across the following decades. It is also unbelievable that "Christopher Marlowe's" byline appeared for the first time a year after his death, and yet claims of his authorship of many texts are accepted as believable. So, read my series, and then you will find out if you believe in my six-writers conclusion or not. If you dismiss it without reading the evidence, and while continuing to believe the current nonsensical attributions... you are doing yourself a disservice.

33DuncanHill
Modificato: Dic 4, 2021, 2:23 pm

>21 faktorovich: I question the competence of anyone claiming to write about the English language who does not understand the expression "what a load of bollocks".

>25 faktorovich: Not Rowman, Rowse. Any Shakespearean scholar will know him, even if they disagree with him.

34faktorovich
Dic 4, 2021, 2:21 pm

>16 2wonderY: My research has been cited in 32 scholarly articles/books according to Google Scholar: https://scholar.google.com/citations?user=dJD72pMAAAAJ&hl=en. So, it is a fact that my analysis of literature has been appreciated by many, and it is untrue to call it "worthless".

35Taphophile13
Dic 4, 2021, 2:27 pm

I didn't know that "LibraryThing exists to explain new relevant research with librarians." as stated in >25 faktorovich:. Where can I find out more about this and does LT have other purposes that members may be unaware of.

36faktorovich
Dic 4, 2021, 2:28 pm

>17 norabelle414: It is incredible how insulting your statement is without delivering any rational message. As I explain in my "Author-Publishers" book, the best British-American writers were also publishers/self-publishers, including Dickens, Woolf, Twain etc. etc. In the sciences, the best example is Galileo Galilei's "Dialogue Concerning the Two Chief World Systems" that he self-published before it was banned. Nobody in the establishment would have given a positive peer review to Galileo if he had pitched his Earth-revolves-around-the-Sun theory in contradiction with the previous idiotic model that the Earth was in the center of the universe. Galileo was found guilty by the Inquisition, so this is the company you are keeping if this is where you have chosen to go with your argument. The best science/literature has always been self-published because the establishment has always prioritized being celebrated for their brilliance over scientific facts, and the merits of true literary greatness.

37MarthaJeanne
Modificato: Dic 4, 2021, 2:45 pm

>20 faktorovich: When these findings are explained by someone who can write clear English, is employed at a major university, and publishes in a major publishing house, then I might go into them. Until then, I have no intention of wasting my time. >17 norabelle414: Sounds very rational to me.

38faktorovich
Dic 4, 2021, 2:47 pm

>24 paradoxosalpha: You are arguing that because I am a quick writer this means that nobody else can be quick at writing. This is obviously false and the opposite of the truth. The speed of my writing only proves the normal speed achieved by a professional writer/ ghostwriter. You are focusing on censoring my research without addressing what my research is.

39faktorovich
Dic 4, 2021, 2:50 pm

>28 lilithcat: Who the other censors might have been is unknown, but that you censored me is a fact that you keep repeating. My post invited discussion about research and did not violate any Terms of Service, which cannot be proven or disproven now since you and the other censors deleted the post in question. So it is pointless to argue about what it was unless you still have it and want to repost it here.

40faktorovich
Dic 4, 2021, 2:53 pm

>33 DuncanHill: Rowman is the publisher. I added a more specific citation for the series you were referring to, and explained why your reference to it was unrelated to what I was saying about a lack of prior Percy translations. You have now also misunderstood my clarification, while making it seem as if I have misunderstood something. I hope all who are confused by this will refer to my previous response to DuncanHill. I understand the expression "what a load of bollocks"; I used ironic confusion to show the absurdity of the underlying meaning of this insult.

41faktorovich
Dic 4, 2021, 2:55 pm

>35 Taphophile13: What exactly do you think a website dedicated to the Library have to share if not research about library science, librarians, and other related subjects? Why would chatting about low-brow pop fiction be anywhere as significant as a scholarly discussion about the underlying authors behind the British Renaissance?

42lilithcat
Dic 4, 2021, 2:57 pm

>34 faktorovich:

You may have been cited, but not always with approval.

43MarthaJeanne
Modificato: Dic 4, 2021, 3:01 pm

>41 faktorovich: https://www.librarything.com/about

"LibraryThing is an online service to help people catalog their books easily."

Because, more people are interested in chatting about fiction than in the underlying...

44lilithcat
Dic 4, 2021, 3:00 pm

>39 faktorovich:

that you censored me is a fact that you keep repeating.

No, it is a NON-fact that YOU keep repeating. I have said here, and in communications with you, that I have no power to censor anyone here, even if I wanted to.

I cannot understand how someone who considers herself a scholar can be so lacking in the ability to understand such clear statements.

45faktorovich
Dic 4, 2021, 3:00 pm

>37 MarthaJeanne: You are making an appeal-to-authority logical-fallacy, together with a xenophobic insult directed at my Russian heritage without citing examples of any lack of clarity in my English. There is no rationally necessary order between a scholar publishing field-changing research and his/her employment by a "major university". In most cases, the ground-breaking research comes first, and the recognition of these findings and subsequent employment in the Ivy League comes afterwards. And two of my books have been published with McFarland (a major publishing house) nearly a decade ago. If you believe posting insults and slander is less of a waste of time than just reading the research in question; then, you obviously would not understand my books even if you read them.

46faktorovich
Dic 4, 2021, 3:03 pm

>42 lilithcat: If you actually read the dissertations/books/articles that Google Scholar links to, you would find that all of these citations were with approval, including a special note of thanks to me for the review from an economist who said he adjusted the book before publication with help from my review of it in PLJ. I have never seen a negative comment about my work in a published text that mentions it.

47DuncanHill
Dic 4, 2021, 3:05 pm

>40 faktorovich: The publisher was University Press of America, before they took over Rowman and Littlefield. As far as I can tell it was not republished under the Rowman and Littlefield name - I think the last printings were in 1987, and the takeover in 1988. Using a publisher's name which does not appear on the works or in Cauveren's bibliography is needlessly confusing, and suggests an unfamiliarity with the work. Perhaps if you had taken a little more time you would have expressed yourself more clearly.

Perhaps you should refrain from saying you don't understand something if you don't want people to believe you.

48Taphophile13
Dic 4, 2021, 3:06 pm

>41 faktorovich: I mentioned nothing about pop fiction nor was I comparing the significance of anything to anything else. I am trying to find out where it says that the site is "dedicated to the Library" and that research, libraries and related subjects are the appropriate topics. Sorry if I don't meet those standards.

49faktorovich
Dic 4, 2021, 3:06 pm

>43 MarthaJeanne: My study proves beyond-doubt that nearly all "author"-bylines across the British Renaissance are currently incorrect. So if finding this out is not something that eases or assists librarians with accurate cataloging of books; what would? And the number of comments in this thread so far clearly proves there is "interest" in discussing "the underlying..." or my study.

50faktorovich
Dic 4, 2021, 3:08 pm

>44 lilithcat: I understand that you want to convince the group that I do not understand what you want to censor me, but have no power to do so, and you have not done so, but have also indeed done so.

51faktorovich
Dic 4, 2021, 3:13 pm

>47 DuncanHill: As any researcher would have done, I looked up what you were referring to and came across this page: https://rowman.com/Action/SERIES/_/CSS/The-Contemporary-Shakespeare-Series. I mentioned Rowman simply to assist other readers with finding the series you were referring to, and not because it made any difference who the original publisher was. Why would you or I check Cauveren's bibliography to research the version of this series that is currently available for purchase/reading? As I explained the more important point is that this series is irrelevant to your argument that there have been translations of the texts in my series before.

52MarthaJeanne
Dic 4, 2021, 3:14 pm

>49 faktorovich: I haven't heard anyone here saying that they want to discuss your study. I have heard many voices saying that they don't think LT should have showcased it.

>50 faktorovich: If you again try to write about any of your books in Talk outside of Hobnob, several members will flag the post, and your post will disappear. This is what happens to all advertising on LT.

53faktorovich
Dic 4, 2021, 3:15 pm

>48 Taphophile13: I was paraphrasing your own quote: "LibraryThing is an online service to help people catalog their books easily." You don't see the relationship between my paraphrase and this line?

54faktorovich
Dic 4, 2021, 3:17 pm

>52 MarthaJeanne: Yes, most of the comments posted here are not discussing any of the facts I bring up in the interview or in my study. This is instead an attempt to censor LibraryThing from publishing the completed interview without any rational reason given for such censorship other than the demands of the insulting mob.

55andyl
Dic 4, 2021, 3:18 pm

>34 faktorovich:

And how many of those entries in Google Scholar has been in the field of computational linguistic analysis?

56andyl
Dic 4, 2021, 3:19 pm

>50 faktorovich:

The problem there is that everyone else posting here understands how LT works. How the flagging mechanism works. It seems that you do not.

57Taphophile13
Modificato: Dic 4, 2021, 3:22 pm

>53 faktorovich: I did not say that; it was >43 MarthaJeanne: who said it and she was quoting from here: https://www.librarything.com/about. Please don't misattribute statements.

58faktorovich
Dic 4, 2021, 3:23 pm

>55 andyl: As I explain in the interview, I have been working on related subjects to this series across the past 20 years. My "The Formulas of Popular Fiction" (11 citations) dissects both the structural and linguistic formulas of these texts, as do my other books like "Rebellion as Genre" (6 citations: dives into Scottish-English linguistics as well as the linguistics of the authorial styles), and "Gender Bias in Mystery and Romance Novel Publishing: Mimicking Masculinity and Femininity" (2 citations: explores the linguistic differences/similarities between male/female writing styles).

59aspirit
Modificato: Dic 4, 2021, 3:30 pm

>19 faktorovich: The contracted phrase "I can’t" only appears in “Anne Bronte’s” Agnes and in “Emily Bronte’s” Wuthering, confirming that the signature behind Wuthering is at least partially “Anne’s”… "I did not" appears among the top-6 phrases in every one of the 5 tested “Bronte” texts (in 4 of these as the 1st or 2nd most frequent)—“Emily”, “Anne” and “Charlotte’s” —and only appears in one other text, oddly enough, in “Thoreau’s” Walden

Many of us on this site are authors or avid readers familiar with English-language publishing processes, historical and modern. We're really supposed to pretend we can't see what's wrong with these assumptions? (ETA: Or, worse, we were expected to argue against all of it?)

What was the LT team thinking with this spotlight?

(Please note, faktorovich, that I'm not asking for you to answer. I have no interest based on your behavior in these forums to converse with you.)

60faktorovich
Dic 4, 2021, 3:24 pm

>57 Taphophile13: The wording of your message indicated that you were offended about me contradicting your earlier statement.

61Taphophile13
Dic 4, 2021, 3:27 pm

>60 faktorovich: You misread me. I am amused by all this and I think it is someone else who is taking offense.

62faktorovich
Dic 4, 2021, 3:29 pm

The phrases "I can't" and "I did not" are referring to the top-6 most common 3-word phrases in these texts. The rest of the article explains this point. Authors tend to repeat some of their favorite phrases frequently. I prove this argument across Volumes 1-2 of the series as well as in this article about the "Brontes". My responses in this forum have been extremely polite and purely rational in comparison with the avalanche of insults hurled at me, and yet I would not "block" anybody for responding to my research even if such replies are nonsensical.

63faktorovich
Dic 4, 2021, 3:31 pm

>61 Taphophile13: Given the wide range of insults posted, it would be very strange if I missed the intended offense entirely.

64thorold
Modificato: Dic 4, 2021, 3:56 pm

>19 faktorovich: It is possible that Charlotte and Anne Bronte were truly authors, but it is more likely that there were two male collaborating professional ghostwriters desperate to make a living, who were experimenting with multiple bylines until they struck fame with the “Brontes’” feminine names.

You seem to be missing the point we’re all making here, that extraordinary claims require extraordinary evidence. That is an extraordinary claim. It only becomes “more likely” if you have strong biographical evidence of the existence of the ghostwriters and their links to the Brontës. Otherwise, the accepted attribution, vouched for by a huge body of letters, memoirs, manuscripts and all the rest, has to stand, and at most you have exposed possible doubts about who wrote which text.

I’d guess that the conditions the siblings grew up in, playing writing games together and sharing most of their activities, would skew the data of their writing styles anyway.

65AbigailAdams26
Modificato: Dic 4, 2021, 4:18 pm

Hi All: My apologies for dropping this notice last night and only returning to the discussion now. I wanted to get the interview out, and to bring it to members' attention, but in the intervening time have not been able to get onto LT. Saturday is usually my out and about day.

I've been reading through the thread, and I'm seeing two main things. First, that many LT members do not agree with the results of Dr. Faktorovich's study (to put it mildly), and do not think LT should have highlighted it, with our interview. Second, that the conversation between Dr. Faktorovich and other members has not been going well, with accusations of censorship and insult on one side, and of spamming and prolixity on the other.

I will try to go through and answer specific concerns raised, but on a general note, I want to say that the decision to present this interview is on me. Tim gave me approval, but I was the one who assented to it, and presented the idea to him. I did so because I thought the thesis was interesting, and the method of analysis novel. I thought members might also be interested, whatever their eventual conclusions, as the books were offered in our most recent batch of Early Reviewers.

I am sorry if anyone feels this is not in the spirit and tradition of LT, or that I have highlighted ideas or members I should not. It certainly is not my intention to present the interview as LT's position on the literature of the British Renaissance. In case it needs saying, we don't have one. Or if we do, it has not been communicated to me. I simply saw an unusual idea and an unconventional approach, and thought it might be of interest.

All that being said, I also want to acknowledge that it is the right of every LT member to disagree with both the interview's conclusions and the decision to post it. I mention this because the idea of free expression has been touched on in the thread. I do not consider it an insult to me personally, or to the site, that the decision to post the interview should be criticized, or even lampooned; nor do I consider it an insult to the author that her study should be rejected, criticized, or even ridiculed. While we want to promote interesting and constructive discussion, people have the right to disagree and to voice that disagreement, just as the author has a right to defend her thesis. I don't think anyone is best served by assuming malicious intention, however strong the disagreement.

Regarding the issue of self-promotion/spam/censorship, LT does have specific rules in this regards, and posts that have received multiple flags will in fact be deleted, as mentioned above. Specific groups also have their own specific rules, and these rules should be respected.

Anyone wishing to discuss this with me privately is welcome to do so, through the usual channels: leave a private comment on my profile page (https://www.librarything.com/profile/AbigailAdams26), or email me at: abigailadams@librarything.com.

Dr. Factorovich, I will be emailing you shortly.

66Keeline
Dic 4, 2021, 4:05 pm

According to one source (Wikipedia):

The English Renaissance was a cultural and artistic movement in England from the early 16th century to the early 17th century.

How does the work normally attributed to the Brontë sisters ("The Brontës were a nineteenth-century literary family", from another Wikipedia page) get labeled as works of the English Renaissance. Or has the definition been greatly expanded and I did not notice?

I grant the failure to find accurate information on some Wikipedia pages, like "Edward Stratemeyer", but find these two 2to be well cited and probably accurate.

James

67faktorovich
Dic 4, 2021, 4:08 pm

>64 thorold: Volumes 1-2 of my Re-Attribution series have 698 pages that provide "extraordinary evidence" of my re-attributions of the 284 Renaissance texts I tested. I quoted a segment out of my "Brontes" article that also addresses several other attribution mysteries. This "Brontes" article will be published this spring or a bit later in the Journal of Information Ethics, and then you can read the full set of evidence to see if you want to counter it. The Re-Attribution series is already available for sale on Amazon, and I have given away free copies of it to Early Reviewers on LibraryThing. If the games the three sisters were playing altered their writing styles to be similar, then they would have all registered a single style, but instead there are two styles between the three sisters. There are intersections between these two styles that also hint at some degree of collaboration, but there is definitely no third style in this mix.

68faktorovich
Dic 4, 2021, 4:20 pm

>65 AbigailAdams26: Dear Abigail: Thank you for your rational response to the points raised in this discussion. I have not asked for anybody to be censored from this forum, even if their comments have offended me. I do not think this discussion has gone uniquely badly, but rather it is indeed very normal for the types of discourse history-changing research faces in the public in our modern world. I have not reposted any posts in any forum where my posts were deleted, and I am certain I did not violate "spam" rules when I made the post being referred to. The reason my post was flagged was obviously because this group disagreed with my findings and not because it was self-promoting, as this is what they have repeatedly stated in this discussion without quoting the post that they deleted to let it speak for itself. I hope anybody who wants to continue this important discussion will just address the evidence and claims I raise in my studies. I welcome all relevant criticism because similar questions can occur to other readers who are only reading these excerpts I am mentioning in the interview/these replies, without the rest of my studies. I am happy to explain my findings further, and this is what I hope the rest of this forum discussion will be about.

69faktorovich
Dic 4, 2021, 4:25 pm

>66 Keeline: As I believe I mentioned before, I have applied the computational-linguistic authorial-attribution method I invented to texts between 1560 through around 1940 in different projects. The Renaissance Re-Attribution series is the set of 14 volumes (2,500 pages) that I have just published and was available for Early Reviewer reviews - in it, I tested texts between 1560 and 1662. I also wrote (but haven't published yet) a book (300,000-words) on the Re-Attribution of the British 18th century, including the Daniel Defoe byline. And I have written this forthcoming article on the "Brontes" for the Journal of Information Ethics that covers British-American texts between around 1850-1940. I have tested the method on different periods and authors deliberately to check that it is always accurate in its attributions. Let me know if I can clarify this further.

70paradoxosalpha
Modificato: Dic 4, 2021, 5:46 pm

>38 faktorovich: You are focusing on censoring my research without addressing what my research is.

Hardly. I'm just sitting here with a well-buttered barrel of popcorn. I have no interest in censoring you, and you can't compel me to take your intellectual product seriously when I consider it risible.

71aspirit
Dic 4, 2021, 6:09 pm

>65 AbigailAdams26: Thank you for clarifying.

72Petroglyph
Dic 4, 2021, 7:14 pm

Wow, this thread really blew up. But I'm posting in an epic thread, so I've got that going for me, which is nice.

>29 faktorovich:

"The Hebrew language had a significant early impact on many world languages. And Sanskrit has been acknowledged to be an ancient language that also influenced those that followed it. What does any of this have to do with my research?"

The cranks I refer to posit that Hebrew / Sanskrit / Tamil / Serbian /... is the mother tongue to Latin, English, Welsh, Chinese, Quechua... Not "related to", not "has had an influence on", but the ancestor to! This just cannot be true based on everything we know about these languages and their history; and, furthermore, these cranks make their claims based on crappy methodology and motivated by unscientific concerns. "What does any of this have to do with my research?" Well, the very first words of my comment were "This is the literary equivalent of those linguistic conspiracy theorists who ...". Your work is strongly reminiscent of those crackpots because it uses the same kind of argumentation (a complete up-ending of established research based on a single person's idiosyncratic methodology) and the same lines of defense. I recognize the flags, is all.

"You seem to have concluded ironically that the establishment is made up of idiots who have had a blind-spot for the British Renaissance as they have been propagating the greatness of "Shakespeare" via his pro-monarchy plays. "

Don't put words in my mouth. I never said anything of the sort. Don't create strawmen.

"Yes, ghostwriters are the downtrodden minority, who have received little credit"

Sure. I'm broadly sympathetic to that claim. My issue lies with the embedding of this concern within extraordinary claims and uncritical acceptance of the results of some self-published non-peer-reviewed stylometric pipeline. Your desire to highlight the plight of ghostwriters and to demand respect for their unrecognized work does not mean your wide-ranging rewriting of literary history is correct. In fact, wanting a result to go one way should be an extra cause for concern if the results indeed do go that way.

"Those who were refuting Einstein's theory of special relativity..."

You're not Einstein. This is an exaggerated comparison that makes you look like a victim, like an unrecognized genius who was eventually proven right. It's a flattering image that doesn't alter the truth level of your hypotheses, merely the appearances.

"reading the research they are refuting? You have not even mentioned the computational-linguistic attribution steps I included in the interview"

Please provide me with the doi of the paper in which you've published your algorithm and I will! I have full access to two well-stocked university libraries with all their journal subscriptions and to ILL.
Please don't ask me to pay for your self-published book. That's not what a proper scholar would do.
If I were to discuss the "computational-linguistic attribution steps" in the interview -- which are likely to be incomplete and abridged and dumbed down etc. for public consumption -- I'd likely be confronted with "well, it's better explained in my book anyway". So no. Give me a doi or let this go.
Remember: you're the one making the (extraordinary) claims. It's up to you to prove them right. Not up to me (or other members here, or literary scholars and digital humanists) to fail to disprove them.

"There are several typos in the Callaway article you cite"

1) Irrelevant to the results. 2) A blog post has different standards than scholarly books. 3) Many reviews of your works here on LT mention poor editing and spelling errors. I'm not holding that against you either. Spelling is totally irrelevant. Honestly.

"My explanation is far more sophisticating and logical than what Callaway does and it addresses the rules of storytelling in the literature field, instead of using NSC classifications that are more typically applied to the statistical study of cell behavior in biology."

More sophisticated also means more places where things can break. Look, the relative sophistication is not an issue here. I brought up Callaway's "fun experiment" (as I called it) as an example of the kinds of neat things computational methods can bring to stylistics. Also, to those of us working in digital humanities and linguistics and stemmatology it is absolutely bog-standard to alter phylogenetic models from biology to model cultural and textual evolution. That really isn't a weird thing to do, like at all.

"You seem to be claiming that Callaway is superior simply because scholars have agreed he is superior, and anybody who questions his superiority is not worthy of consideration."

Nowhere have I said this. This inference is entirely yours. Again, don't put words into my mouth. No strawmen, please. Plus, it sounds like you're reaching for that unfairly maligned victim role again.

(Callaway is a she, btw.)

"Instead of saying nonsense about how I walk, look or talk (which you cannot see given our distance and the textual method of communication), why don't you actually point out what specific elements of my findings you find to be quacking?"

DO NOT PUT WORDS INTO MY MOUTH. There's an idiom in English that goes "If it looks like a duck, and walks like a duck, and quacks like a duck, chances are it is a duck". It roughly means that if something (or someone) gives off all the signs of being an X, people are justified in calling that thing an X, even though looks may be deceiving, even though you shouldn't judge a book by its cover. Actions speak louder than words.

So. I did not say you look or walk like a quack, I said "Faktorovich's work, in contrast, looks, walks and talks like quackery" (emphasis added). That means, your work gives off all the signs of being unscientific poppycock, and so that's exactly what I'll call it. If your peer-reviewed algorithm convinces me otherwise, I'll retract this.

"anti-Semitism, nonsense and insults against a woman's looks"

DO NOT PUT WORDS INTO MY MOUTH! Don't try and paint me with filthy brushes to make me look bad. Or in other words: no strawmen, please. You created several strawmen in your post in your haste / eagerness to paint yourself as the unfairly maligned victim, the soon-to-be-vindicated Genius, the righteous Defender of the Downtrodden. This is not how a proper scholar does it.

I will no longer respond to strawmanning and victim-playing. Give me the doi's I asked for, and leave me to read your work at my convenience.

73norabelle414
Modificato: Dic 4, 2021, 7:58 pm

>36 faktorovich: You're calling me the same as the Roman Inquisition because I said I don't think LibraryThing should promote your self-published research without a second opinion? I work in research ethics and I assure you we do not execute people for not getting a peer review. We just don't let them publish their work in reputable journals until they do.

74faktorovich
Dic 4, 2021, 9:01 pm

>72 Petroglyph: If you are going to dismiss any researcher as a "crank"; you have to at least cite this researchers name and the name of the article/book where this research appeared, so that I and everybody else can look up the proposed theory to test if it is indeed absurd or if there is some rational basis for this argument. It is irrational to dismiss any theory simply because you don't believe in it immediately upon reading a gist of what it is about.

A "conspiracy" can be the simple act of plotting or conspiring in any endeavor; some conspiracies are irrational, while others are entirely rational/ scientific or have a specific political goal that can be either achieved or not. You are using the term as if it is a fact that all conspiracies are false and fraudulent, but the American Revolution started and ended as a conspiracy. The main point I find revealing in this paragraph is: "a complete up-ending of established research based on a single person's idiosyncratic methodology". I mentioned Galileo before - he was alone in his conclusions about the relationship between the Earth and the Sun. His method was just as simple as my own - he looked at the sky and studied the movements of the planets and the sun. I am simply applying a group of 27 different linguistic tests in a systematic manner that reveals the quantitative similarities and differences between the linguistic styles of authors. Galileo would have had a calm career in academia if he had ignored the facts he saw in the sky, and just agreed with the established theory; instead he spent most of the rest of his life in confinement, and his ideas are still not fully accepted by some in our modern times who have argued the Earth is flat. If I had argued that I found strange numbers written on the back of my head, or had seen a dream about the Renaissance; you would be justified to call my findings idiotic. However, the data on the GitHub page I cited applies 27 tests X to 284 texts with X millions of words and compares all of these texts to all of the other texts. There are tables that explain the structural patterns, and tables that cite the original publishers/ performance information. This is the largest study of its type to be applied to the Renaissance; it responds to previous computational-linguistic methods and explains why these earlier attempts failed. Have you tried following the steps I provided? What about these simple steps is anything but simple statistical mathematics that are as irrefutable as 2+ 2? There are moments in scholarship when old ideas are proven to be wrong by new researchers; dismissing all conclusions that contradict established history/science is far more conspiratorial than allowing the math speak for itself.

While you are misinterpreting my statements, I have summarized your points precisely; while you did not use the word "idiots", it was implied.

Here is how scholarly dialogue is supposed to work: 1. Scholar A proposes an idea that changes the history of Britain, so it is published in a scholarly journal that welcomes such dialogue, 2. Scholar B reads the full book/article Scholar A has published; and then does additional research and publishes an article that refutes or confirms Scholar A's findings. 3. Then, there is a rebuttal or contributions from other scholars that debate the evidence to weigh if it is conclusive on either side. Instead, in modern academia: 1. Scholar A proposes a history-changing idea that is immediately and irrationally rejected from all "insider" scholarly journals and publishers by Scholar B, who send insults and nonsensical reasons for the rejection that show they did not read the research in question, and are simply rejecting it because they have noticed it proves their own research to be wrong. 2. Scholar A has to either trash their findings, or self-publish them. 3. If the work is self-published, all of the major review journals that are read by librarians who make purchasing decisions reject the work simply because it is self-published without even opening the book. So we are now at this final stage. And you are seriously arguing that this system is right to block all new and history/science-changing research from reaching the public?

It was not at all my goal to discover that ghostwriters were behind "Shakespeare", as I had hoped this was a real person and most of the bylines were accurate. The numbers just proved this intuitive desire in me to be wrong.

I am indeed not either Einstein or Galileo. I have not been arguing that I am a man from previous centuries. If I was Einstein, would you have similarly responded to his pre-fame papers by insisting that since he is not Galileo and has not already been accepted as a great scientist, his theories are not worthy of your research before you dismiss them as irrational?

I would be delighted to send a free review copy of the entire series to you via email - if you give me your address. You can request it by emailing me at director@anaphoraliterary.com. I am actively seeking reviews on LibraryThing, so your willingness to review the research is appreciated. Volumes 1-2 are 698 pages long and you have to read this full book before judging my method, as the entire thing provides different types of evidence that re-affirm the central linguistic findings. The simple steps involved in the method are included in the interview that this string is about: https://blog.librarything.com/2021/12/an-interview-with-scholar-anna-faktorovich.... I did deliberately simplify the steps for this interview to allow all members of the public to use this method without reading the full book to solve mysteries of their own.

The reviews of my work on LibraryThing have been libelous as there is no spelling/grammatical etc. errors in my books. They never cite any errors, and instead just use broad insulting language to dismiss my books as unworthy of public consumption. This is similar to the unspecific dismissals of my research in this conversation. I have not filed complaints about the falsehoods in these reviews because I believe all speech should be heard even if it is false, harmful to somebody being attacked and malicious. In contrast the Callaway article includes absurdly obvious typos like parenthesis () without anything enclosed in them, so these errors are absolutely unacceptable for any posted text in a blog or elsewhere. My knowledge of English spelling and grammar would become apparent if you read Volumes 3-14 of my Re-Attribution series that include extremely detailed notes on language uses to assist readers with understanding my translations from Early Modern English to Modern English. I have written thousands of words in this discussion today, so anybody who is unfamiliar with my work can also check what I have written here to see if I have made any mistakes even when speed-writing on a blog.

It is in fact common for computational-linguists to adopt biological or other hard-sciences approaches to testing the authorship of literary texts. My study explains that this is a mistake. The science of literary composition has very different rules from how nature makes biological cells.

If you did not make claims of quackery before actually reading even the interview where I summarize the method; you would not have to retract your unsupported opinion when it is proven wrong. Yes, you can take your time with the review of my work, but I am assuming you have given yourself an out by stating that you will not consider my research unless other scholars have already stamped it with agreement; as we discussed previously I have indeed self-published this series, so obviously it did not go through peer-review by this simple fact. Victim-playing? Imagine a scenario where we were talking about a rape or an attempted murder, and you were insisting that the victim speaking up about an attack was a case of "victim-playing". The statements made in this discussion are public-record, so let others decide if I am a victim or not. Don't divert the discussion into the irrelevant topic of my victimhood, and instead just read my research and respond to it rationally in a review that actually shows you are familiar with what you are critiquing.

75lilithcat
Dic 4, 2021, 9:40 pm

>65 AbigailAdams26:

Abby, I have a couple of questions for you.

1. How does LT choose which authors are interviewed?

2. This book has been described throughout this thread as "self-published". Indeed, the author herself says that it is. (See >74 faktorovich:: "as we discussed previously I have indeed self-published this series,"). Yet the ER rules clearly state that "Early Reviewers limits participation to select publishers." So how does her admittedly self-published book come to be an ER offering? Was an exception made? If so, why?

76paradoxosalpha
Dic 4, 2021, 10:25 pm

>75 lilithcat:, re: >65 AbigailAdams26:

Indeed, other self-published authors are curious!

77Crypto-Willobie
Dic 5, 2021, 12:20 am

>21 faktorovich:
It's a British idiom for 'a lot of nonsense'.

78SandraArdnas
Dic 5, 2021, 4:32 am

Wow. The author's comments here would thoroughly put me off reading her work even if interested in the topic. Inability to hold a reasonable discussion and resorting to emotional claptrap instead is not exactly something that recommends her scholarly work.

79anglemark
Dic 5, 2021, 5:56 am

I'm for the first time ever sorely disappointed in LibraryThing's staff. What's next? Cold Fusion? Flat Earth?

80andyl
Dic 5, 2021, 6:52 am

>74 faktorovich: I mentioned Galileo before - he was alone in his conclusions about the relationship between the Earth and the Sun.

Except he wasn't alone and incdentally he wasn't first either. There were quite a number of people in expounded a heliocentric view of the universe.

81faktorovich
Dic 5, 2021, 10:42 am

>75 lilithcat: I am an author-publisher. I own and direct Anaphora Literary Press, which has published over 300 different titles by different authors, dozens of which have been offered through Early Reviewers on LibraryThing. It is thus both true that I have self-published and that I have been published by a publishing company. I explain the fact that most of the best authors (Dickens, Twain etc.) published themselves with their own publishing companies in the "History of British and American Author-Publishers". The rule against self-publication is intended for those without publishing experience who are more likely to leave mistakes and create unprofessional books. When this rule is used by reviewers (as I have seen at many major newspapers/magazines) to bar all small-publishers, all self-published authors, and author-publishers, it bars access to the press without considering the quality of the texts being pitched.

82AbigailAdams26
Dic 5, 2021, 10:46 am

>75 lilithcat: Your first question is a little difficult to answer, as I have only participated in a few interviews thus far, and don't have a large sample size. I would say the best answer would be that there is no set formula for the choice. That being said, it is at least partly driven by books or projects the LT staff find interesting, by what is current and notable in the book world, and by who we know or are acquainted with. We don't limit our interviewees to authors, but also aim to interview booksellers (we have one such interview forthcoming), publishers, and other figures from the book industry. If you have specific interviewees you would like to see, by all means send them along to me, although I cannot guarantee that they will ever appear on the blog.

In terms of this specific choice, I was approached by Dr. Faktorovich about the possibility of an interview, and being aware that she had contributed thirteen titles to our recent ER batch, and that these books represented a major project she had undertaken, thought it might be an interesting topic. I was intrigued, partially because the idea of this kind of computational analysis of literature was foreign to me, and partially because I appreciates that the project was making hitherto inaccessible texts (the works of William Percy, as well as those anonymous works attributed to him by Dr. Faktorovich) available to the reader of today. I have sympathy for such projects—I greatly admire The Other Voice in Early Modern Europe series, for instance—and as the only person in my masters cohort who worked with older texts—I wrote my thesis on three centuries of Reynard retellings for children in the Anglophone world—I appreciate the kind of archival research it must represent.

That being said, no interview presented on our blog, or excerpted in SOTT, represents LibraryThing's view on said subject, nor should our presenting of any interview be read as endorsement. We try to pick subjects that are of interest to readers, that is all.

2. Policing the border between publishing and self-publishing is actually rather complicated. Our policy, as explained to me when I started, was that books would not be eligible for ER if they were self-published, whether through some kind of pay to print company, or through a vanity press which only deals with one author's work (publishers set up by individual authors to publish their works alone). Small and independent publishers are eligible for ER if they print the work of more than one author. Dr. Faktorovich's publishing house, Anaphora Literary Press, falls into this latter category, and has been participating in ER for some years now, I believe.

I think the confusion here lies with the use of the term "self-published." In a sense, these books are self-published, as the publisher is one owned (I presume?) and run by the author herself. But they are not self-published, as we define the term for the purposes of ER. I hope this clarifies matters.

83faktorovich
Dic 5, 2021, 10:47 am

>80 andyl: I am certain that Galileo was indeed alone in his findings in his lifetime, even though such theories were previously vaguely proposed without the needed scientific proof to make them believable. If as you claim, he was not alone and there were others helping his position; then, obviously I am at a far greater disadvantage than Galileo, as I am indeed alone in my findings, and nobody else has yet read far enough into my research to join my view of British history.

84lilithcat
Dic 5, 2021, 11:07 am

>82 AbigailAdams26:

Thank you for the clarification re: "self-published".

85faktorovich
Dic 5, 2021, 11:07 am

>82 AbigailAdams26: Dear Abigail: Thank you for your kind remarks, and for the thorough explanation of your policy. Few newspapers/review magazines etc. I have pitched this project to for review have even noticed Volumes 3-14 that translate previously inaccessible texts; instead, they tend to stop at the idea that six ghostwriters wrote the Renaissance and dismiss the project without reading the details of how I arrived at this conclusion. While most readers in the general public assume that the British Renaissance has already been over-translated and over-analyzed, specialists in this field learn that the most historically interesting texts from these decades are inaccessible because they are handwritten, transcribed in old-spelling, or published with few and frequently inaccurate translations. I primarily added the translations to this series because they allowed me to find a few new pieces of evidence to confirm my six-ghostwriters conclusion in every page of annotations, but as I read the translated versions of these texts, I realized these are great literary treasures that are more deserving of public attention than canonical "Shakespeare" texts.

86susanbooks
Dic 5, 2021, 3:22 pm

>49 faktorovich: "My study proves beyond-doubt"

Right there you reveal yourself as a schlock scholar. You're embarrassing yourself with all of the replies at this point. For your own self-respect, you should stop.

87susanbooks
Modificato: Dic 5, 2021, 3:34 pm

>66 Keeline: "How does the work normally attributed to the Brontë sisters ("The Brontës were a nineteenth-century literary family", from another Wikipedia page) get labeled as works of the English Renaissance?"

Don't you see? Despite volumes of letters, diaries, manuscripts from publishers, friends, the authors themselves, the use or lack thereof of the simple phrase "I can't" proves that two guys wrote everything from the English Renaissance, including those previously assigned to those canonical Renaissance authors, the 19th-century Brontes. /s

This is so ridiculous. As >64 thorold: said, extraordinary claims require similar evidence. I'd be embarrassed if one of my first-year students advanced such a thesis & defended it as you are. You must have taught comp. Go look at one of your old books and remind yourself how to argue a point honestly & convincingly. You're showing absolute contempt for this audience. Don't be surprised if they return it.

88faktorovich
Dic 5, 2021, 4:15 pm

>86 susanbooks: "Beyond-doubt" is a legal term: "Beyond a Reasonable Doubt: The standard that must be met by the prosecution's evidence in a criminal prosecution: that no other logical explanation can be derived from the facts except that the defendant committed the crime, thereby overcoming the presumption that a person is innocent until proven guilty. If the jurors or judge have no doubt as to the defendant's guilt, or if their only doubts are unreasonable doubts, then the prosecutor has proven the defendant's guilt beyond a reasonable doubt and the defendant should be pronounced guilty. The term connotes that evidence establishes a particular point to a moral certainty and that it is beyond dispute that any reasonable alternative is possible. It does not mean that no doubt exists as to the accused's guilt, but only that no Reasonable Doubt is possible from the evidence presented" (West's Encyclopedia of American Law). The 698 pages in Volumes 1-2 and the annotations and introductory explanations in Volumes 3-14 of my British Renaissance series jointly meet this standard. The evidence I provide shows that the handwriting styles match the linguistic styles of the ghostwriters. I have also provided legal documents that support the assertions I am making from these decades. And various types of forensic accounting and biographical timelines that further strengthen the re-attributions. Anybody who actually reads this series' facts will be convinced that "no other logical explanation can be derived" than that these six ghostwriters committed fraud by using multiple pseudonyms, and generally wrote if not all of the texts from the British Renaissance, then certainly all 284 of the tested texts. The reader as juror/judge would be left without "doubt", with the exception of "unreasonable doubts" (including all of the intuitive comments that have been made in this discussion regarding folks having blind faith in "Shakespeare" being a real author). I have written extensive chapters on why these six specific ghostwriters are the only ones who had motive, opportunity, and had lifetimes (birth/death dates) that fit the timeline of their published texts. I have considered hundreds of other alternative bylines as potential ghostwriters, testing 104 of these bylines that were the most likely culprits given their biographies, and have excluded all but the six ghostwriters as the culprits. If this response is "schlock" (meaning: inferior); you have an ironic sense for this term.

89MarthaJeanne
Modificato: Dic 5, 2021, 4:21 pm

'Beyond doubt' is not the same thing as the legal 'beyond reasonable doubt'.

This also has nothing to do with cataloguing, as that is based on what is printed in the book. A librarian may make a note of outside information, but catalogues what is printed in the book.

90raidergirl3
Dic 5, 2021, 4:25 pm

>83 faktorovich: oh this hurts my physics head. Nicholas Copernicus proposed and published his heliocentric theory in the year Galileo was 21. Galileo and Kepler both worked to prove and support Copernicus. His book was so important with astronomical data that the Catholic Church wouldn’t ban it because they used his data for calendars.
In your analogy, you are waiting for a Galileo to further support your thesis. Galileo never gets the credit for the heliocentric theory.

91faktorovich
Dic 5, 2021, 4:38 pm

>87 susanbooks: You have not read my response to the nonsensical point about the Brontes Keeline has made. As I explained before, the excerpt about the Brontes was from my separate essay on texts from 1850-1940 and not from the Renaissance series that this interview with LibraryThing was about. The journal article had a word-limit so I did not have the word-space to explore if handwriting analysis would match my linguistic findings. Now that you mention this point, I looked into it briefly by reading a Guardian article on this topic. The male names that initially appeared on the "Bronte" novels were “Currer”, “Ellis” and “Acton Bell”. Curiously, Arthur Bell Nicholls (Charlotte Bronte's widower) was named as the person who sold "Brontes'" manuscripts in 1895 to a "literary forger" called Thomas James Wise. The echo between Nicholls' middle-name and the three initial male names' last name is not likely to be accidental. The sale of these documents to a forger makes it very likely that the documents did not actually exist prior to 1895, and were instead created by the forger to strengthen the case for the actual authenticity of the Bronte sisters as authors. As the surviving husband of one of the Brontes, Arthur Bell Nicholls would have seen a fiscal benefit from promoting their authenticity and otherwise marketing them as great writers. As Sotheby's has explained, the manuscripts attributed to "Emily"/"Charlotte" that came through or from the forger's hands then disappeared from public view until near the present moment, when they are finally up for sale, but have not been digitized into public-access to allow all scholars to evaluate their handwritings in comparison with both Arthur Bell Nicholls and Wise's handwriting styles. There are only a few scribbles that are in this collection from "Emily", and certainly not enough evidence to support the previous volume of shorter writing projects she would have had to do to become the novelist she is claimed to have become. The use of "I can't" is only one piece of evidence; the bulk of the evidence is in the linguistic-data file that I linked to in GitHub that you did not open.

92faktorovich
Dic 5, 2021, 4:52 pm

>90 raidergirl3: You are at the same time not saying anything new, and what you are saying is nonsensical. As I stated before, there were "heliocentric" theories proposed before (one of them by Copernicus), but they did not provide the mathematical/planetary evidence that Galileo provided in his self-published book that moved this theory into being a mathematical fact proven beyond reasonable doubt. Kepler lived in Germany, while Galileo lived in Italy. Kepler is known to have sent a book to Galileo, but the two did not work together. The Catholic Church only exonerated Galileo from his condemnation on heresy charges in 1992, so they very much did indeed ban Galileo's book. Even if the Church used Galileo's data to update their calendars in his lifetime, this did not contradict their insistence on imprisoning him for his research.

93raidergirl3
Dic 5, 2021, 5:36 pm

>92 faktorovich: I didn’t mean to imply that Galileo and Kepler worked together; they were simultaneously proving the theory. I’d argue Kepler’s work did more to mathematically prove heliocentricity.
I used a pronoun instead of the name - it was Coperinus’s book which wasn’t banned due to its valuable data.
We just disagree on Galileo’s role in this. And since the church didn't exonerate Galileo til 1992, I guess his theory wasn’t proven to everyone beyond a reasonable doubt.

94faktorovich
Dic 5, 2021, 7:26 pm

>93 raidergirl3: Galileo's theory was indeed proven beyond reasonable doubt. The problem was that the Church had unreasonable doubt that was rooted in its theological "history" of the universe. Because the Church was the judge/juror over Galileo's potential posthumous exoneration, they were not obligated to exonerate him as long as they had the power not to do so. Insisting the Earth was the center of the universe was more important to the Church than the fair execution of justice, or they did not see their faith-based belief in Earth's centrality as unreasonable until 1992.

95reading_fox
Dic 6, 2021, 4:09 am

11> 2Callaway's fun experiment in untangling the two authors of Good Omens is a good example of that"
Fun - thanks for sharing that!

My very 'wet' science building has a 'dry' team doing automatic analysis of journal articles: not so much looking at authorship but trying to parse meaning and collate systematic reviews.

96andyl
Dic 6, 2021, 4:46 am

>90 raidergirl3:
Yep and that doesn't include Thomas Digges who in A Perfit Description of the Caelestiall Orbes went a bit further than Copernicus. Other astronomers could also be added to the list who helped develop the heliocentric model such as Michael Maestlin.

97ErlendSkjelten
Dic 6, 2021, 8:13 am

It is also worth noting that Galileo did not prove his theory beyond reasonable doubt, because his theory was wrong. He championed a Copernican model with perfectly circular orbits, a theory that did not fully fit even the facts observable at the time, and certainly not the facts we can observe today. The Church had scientific consensus on its side in rejecting his model. Of the many different models under discussion at the time, it was the Keplerian, with elliptical orbits, that turned out to be most correct, but that was not what Galileo argued, nor was it obviously the right one with the evidence available in the early 1600s.

98faktorovich
Dic 6, 2021, 1:24 pm

>97 ErlendSkjelten: A theory is a system of ideas designed to prove something. In a healthy scientific scholarship environment, all theories must be updated, revised and corrected in each generation that follows the initial theory design. With millions of scientists working across the world today, and thousands in this narrow field, it would be very strange if they did not find at least one idea in any 400-year-old theory that was erroneous and needed to be changed to fit new data, analysis and scholarship. Galileo's approach to measuring the movement of the planets, ocean-tides, and various other scientific strategies was the component that made Galileo's book uniquely logical and scientifically convincing for readers. It was banned by the Church because this observation-based science threatened the Church's insistence that it instead knew the truth about the universe's creation through divine inspiration. The Church's "scientists" were theologians who retold the mythology from the Bible about Earth's centrality without observing the facts about the universe's nature. "Consensus" just means agreement; a group can all agree to state that they believe in a religion, or to believe in the Earth being flat, or to believe in anything else; just by repeating this agreed-on point, the group has consensus. Thus, the presence of consensus in a scientific, theological or any other type of group does nothing to prove the thing being agreed on; it simply means that anybody who disagrees with the thing agreed-on cannot join the group because membership requires consensus. Galileo's refusal to consent, and willingness not only to not join the scientific community of his day, but to take their errors on trial as he faced the legal system for his disbelief in the consensus is the martyr-step that pushed later generations of astronomers to be able to join Galileo's theory and to form a new group with this different consensus. Kepler was one of these astronomers who was helped by Galileo's rebellion because despite religious tensions (with Galileo's theory in-print) he retained a position as the imperial mathematician and was not prosecuted for his mathematical/astronomic conclusions.

99prosfilaes
Dic 6, 2021, 3:43 pm

>98 faktorovich: Wow, we're going to argue Galileo here? The Tychonic system seems like the most reasonable one with the knowledge available at the time; as Tycho Brahe pointed out, besides the Earth moving, the lack of any apparent parallax to the stars and their visual size would mean they were incredibly huge (far larger than the Sun) and unbelievably far away.

To continue the analogy, it's not enough to make a good case from one direction; you also have to address the sticking points of those who disagree. Personally, I find it all too neat. Six authors wrote everything under a bunch of different names? "The long tail" comes up quite often in modern discourse; no matter what the top end looked like, I'd expect many authors who wrote one or two books.

Where's the verification step? You don't seem to have ever run it over a known set of works and checked that it produces the expected output. Can it distinguish Conan by Robert E. Howard, L. Sprague de Camp, Andrew J. Offutt, Robert Jordan, Leonard Carpenter, etc.? It's easy to toss together a system that looks good and produces output you like, but it's much harder to produce a system that's correct and that you can show to be correct.

100Petroglyph
Dic 6, 2021, 3:54 pm

All this because "being right in the face of adversity" is apparently a transitive property when comparing yourself to Big Names. I hesitate to name a famous person from centuries ago who was proven very much wrong for fear of triggering more argumentative digressions and well-akshully.

101Aquila
Dic 6, 2021, 4:28 pm

I saw this on twitter yesterday and it seemed apropos:

"You know, it's not enough to say the opposite of what everyone else is saying to be Galileo Galilei. You also have to be right."

I also want to know how many false negatives the system gives when looking at books we know are by the same person. Can it distinguish between Iain Banks and Iain M Banks? What does it say if it can?

In the Good Omens example they compared it on single author works first - and it's presumably reproducible.

I can't think why three sisters who grew up writing stories for each other in a shared world might share similar writing styles and word usage.

102faktorovich
Dic 6, 2021, 4:52 pm

>99 prosfilaes: As I mentioned earlier in this discussion, I have proven the case for the six ghostwriters by providing not only computational-linguistic, but also other types of evidence from these decades in forensic accounting, handwriting etc. documents, in structural patterns between the texts, and hundreds of pages of other types of proof. I have also tested other corpuses to verify the attribution method works (18th-20th century). The "expected output" being judged to be the "correct" answer by previous computational-linguists is the reason they have been making errors in their re-attributions. If academia only judges a re-affirmation of the current bylines to be "correct", while any contradictory evidence is judged to be a faulty method; then, past attributions are repeated even if they were initially made intuitively and without any byline or biographical evidence to support them. Computational-linguists are thus pressured to alter their methods and data until the output repeats the original bylines with over 90% accuracy. Based on my re-checks of these methods, these computational-linguists never give the full method they are using or the full raw data set and the steps the data went through in their papers. When they do give this information, I have proven in various articles/chapters that they clearly are not reporting the actual data, or are only reporting the pieces that fit their primary-bias of needing to re-affirm the established bylines. In contrast, all of my data is available for free on GitHub, and I have summarized the basic steps anybody can use to check this data in this freely available interview with LibraryThing. The test you propose is irrational and impractical because: 1. Currently only books first-published before 1926 are in the public-domain, and the first book in the Conan series was published in 1967; so it is likely to be impossible to find the digitized full-text versions of enough of these texts to test them. 2. Given that these pop-fiction in a narrow genre and with a single storyline; my tests so far would suggest that it is very likely all of these stories were ghostwritten by a single ghostwriter (not necessarily anybody named in the bylines); so there is not likely to be any linguistic differentiation between Conan stories with the Howard byline vs. the Camp and Carter, or Howard and Carter byline. Therefore, even if you could create digitized testable versions of these texts, the answer linguistic testing would generate is not likely to be what you would want to hear. It is also possible there are indeed three or more different linguistic signatures in this mix, but only testing Conan texts would not be enough to verify if these signatures definitely belong to the authors in the bylines, as any or all of them could have purchased ghostwriting services. Thus, a fair and precise test would be to test most of the texts in this fantasy genre published across these decades by hundreds of different bylines, and then to compare biographies to the linguistic results. I have deliberately avoided authors not yet in the public domain or those who might still be alive, or whose editors might still be alive, as such testing is likely to become personal, instead of fixing a problem of past historical attributions that have already been subjects of scholarly re-attribution discussions. My tests are designed to be used by anybody in the public that wants to test any group of texts; because these tests are simple-mathematic calculations, obviously any biases I might have would be irrelevant to the group of texts and the results end-users come up with.

Therefore, if you have a desire to test if my conclusion about the six-ghostwriters is correct, just email me at director@anaphoraliterary.com and I will send back to you a free review link to the 14 volumes of this series and you will be able to see the verifying evidence I have already provided there, and then you could discuss with me any points that you might not agree with. Everybody interested in reviewing this series, is invited to email me for these pdf review copies.

More than half of the texts published between 1560 and 1650 were initially anonymous, but an absurd number of these anonymous texts have since been re-attributed by scholars to an array of different authorial bylines, or at times to people who never had their byline placed on any text during their lifetimes. As I explain in several chapters in the book, vagabond laws made independent authorship as a profession illegal; all authors had to be patronized by a feudal aristocrat who approved their employment. Publishing was controlled by a handful of official monopolies that were initially granted by Elizabeth I (including the one given to one of the ghostwriters, William Byrd, in music/poetry-publishing between 1575-1596, meaning that nobody could legally publish books in this field without Byrd's permission, and the monopoly was transferred to Byrd's affiliates afterwards). Elizabeth also established a duopoly in 1594, wherein only two troupes/theaters were legally authorized to perform plays in the city of London. The rest of my re-interpretation of historical facts from this period makes it very clear that the system was designed for monopolization by a few of these fields. Thus, the dominance of writing by six ghostwriters fits this legal framework. In contrast, given how little writers made and how few books were printed/sold in reality per-title; it is illogical to claim that any writer could have made a living as a "professional" writer by only publishing as few as a single book or staging a single play in a lifetime.

103timspalding
Modificato: Dic 6, 2021, 4:55 pm

>101 Aquila:

I can't think why three sisters who grew up writing stories for each other in a shared world might share similar writing styles and word usage.

You win Talk today.

104faktorovich
Dic 6, 2021, 5:19 pm

>101 Aquila: In the Renaissance with 284 tested texts, the texts with the bylines of the six ghostwriters in nearly-all cases matched the ghostwriters themselves. In the 18th century in a group of a few dozen texts, there were several authorial bylines that only matched texts with this given byline; thus confirming that when no collaboration/ghostwriting is involved, a clear computational match results. In the 19-20th century texts I tested there was too much collaboration/ghostwriting in this small group of 21 texts for any byline to only match itself, but in most cases texts by any given byline matched each other (they just also occasionally matched other bylines).

There is no difference I could find between Iain Banks and Iain M Banks, as this seems to be referring to the same person, who is also a recent writer, and thus his works are not yet in the public domain and difficult to access for quantitative testing.

The Good Omens method is not at all explained in that article, and the software that would have been used can be assumed to be very expensive and only accessible through special privileges to insiders. Thus, this Omens method is entirely non-reproducible. Whereas my method is reproducible by anybody in the public for free.

Just because you are imagining three sisters teaching each other at home with only one of them getting a basic education outside the home, does not mean that it really happened. Their biographies and writing samples could have been forged to appeal to female readers. Thus, computational-linguistics and scientific handwriting analysis is far more accurate in determining authorship in the Brontes' case than trusting the established biographical narrative without such unbiased scientific methods. My tests are for elements that are always divergent between writers because they come from unconscious preferences driven by character-type (extraverted/introverted), interruption patterns (period/commas), preference for exclamation or questioning etc. If three sisters could write on different topics; their research into these topics would have differed thus introducing word-choice/knowledge/stylistic divergences. Have you looked at the data I provide on GitHub? It is extremely consistent despite my application of thousands of checks as I compared 284 texts to each of the other 283 texts on 27 different tests. Again, if your theory about sisters sharing styles was correct; there would be only 1 style between the 3 and not 2 between the 3 as my data indicates.

105faktorovich
Dic 6, 2021, 6:07 pm

>103 timspalding: What is this victory over "Talk" for the day based on? One example of siblings with different writing styles is Maria and Albert Einstein. They went through the same elementary school before their paths split apart. Similarly only one of the Bronte sisters pursued an education beyond their family's self-education. Maria also completed a dissertation at the University of Bern, but she did so in the literature field. Aside for Maria's dissertation, I could not find any books she published across her long lifetime. Brilliant authorship is a very rare trait in the human species, and it is very strange if three brilliant writers appear in the same family. And it is also very strange if all three of them develop identical "writing styles" without splitting into their unique fields of interest as Maria and Albert Einstein did. One example concerning the Brontes from my tests is that "Anne Bronte's" 2 tested novels both had exactly 12 exclamations per 100 sentences, obviously indicating a distinct writing style; but the novels of "Charlotte" and "Emily" varied between 8, 9 and 19 depending on their topics. And again the lexical density between "Anne's" novels is nearly-identical (45.94 and 45.14) whereas "Charlotte's" two novels vary far more widely between each other (47.6 and 51.54). The pattern also stands out in adverbs with "Anne's" near-identical (7.08 and 7.09) but the other three texts between the other two sisters are similar to each other but less so, indicating potential collaborative input from two stylistic hands. On the syllables-per-word test, "Emily's" Wuthering is identical to "Charlotte's" Jane at 1.43, and "Anne's" two novels are near-identical to each other (at 1.44 and 1.45), but the second "Charlotte" novel differs at 1.57. When these different tests are added together, it is clear that the two "Anne" novels were indeed both written predominantly by a single author, but the three "Emily" and "Charlotte" novels share a single dominant author between them with some collaborative assistance from another author or from a heavy-handed editor, who could have been the author behind the "Anne" byline. There are several other patterns that I did not discuss in my article like that "Anne" uses "as well as" as the 4th most-common phrase in both novels, and this phrase only appears in one of the other "Emily"/"Charlotte" texts in 5th place in "Charlotte's" Professor. When I have tested 18th century texts, there were many instances of these types of common phrases only appearing when an author wrote independently without collaborative input; in the 18th century, there were also several texts that had extremely high rates of near-identical linguistic measurements of this sort that again proved independent authorship, typically also by the same author claimed in the byline. This just wasn't the case in the Renaissance (where there was a lot of collaboration: re-combining of pieces of texts into new poetry collections/ versions of the earlier text or co-writing), nor was it the case in this short experiment into the 19-20th century that included the Brontes along with several other authors. If any of you would just glimpse at my data, perhaps you will finally start actually talking about my studies and not what you imagine them to be about.

106susanbooks
Dic 6, 2021, 6:34 pm

Oh. My. God.

When will this end?

107prosfilaes
Dic 6, 2021, 6:43 pm

>102 faktorovich: The "expected output" being judged to be the "correct" answer by previous computational-linguists is the reason they have been making errors in their re-attributions.

So how do you test your model? Mathematical/computer models produce all sorts of wonky results, which is why anyone building one should have trusted test data to feed through the system to check the results.

Given that these pop-fiction in a narrow genre and with a single storyline; my tests so far would suggest that it is very likely all of these stories were ghostwritten by a single ghostwriter (not necessarily anybody named in the bylines); so there is not likely to be any linguistic differentiation between Conan stories with the Howard byline vs. the Camp and Carter, or Howard and Carter byline. Therefore, even if you could create digitized testable versions of these texts, the answer linguistic testing would generate is not likely to be what you would want to hear. It is also possible there are indeed three or more different linguistic signatures in this mix, but only testing Conan texts would not be enough to verify if these signatures definitely belong to the authors in the bylines, as any or all of them could have purchased ghostwriting services.

Robert Howard, L. Sprague de Camp, and Lin Carter are all well known for being writers whose income depended on their writing, and part of communities of writers who could have called them out for using ghostwriters. I don't know who Andrew J. Offutt and Leonard Carpenter are, which makes me wonder why anyone would want to put their name on the book instead of the real author. Looking it up, according to his son Andrew J. Offutt was pretending to be a science fiction author while he made his real money writing erotica under alternate names; that's someone who sells ghostwriting services, not someone who buys them.

because these tests are simple-mathematic calculations, obviously any biases I might have would be irrelevant to the group of texts and the results end-users come up with.

Thud. Thud. Thud. Okay, I've stopped hitting my head against the desk.

I have a simple mathematical calculation; does text 1 equal text 2? No, then they obviously have different authors. Boom, William Shakespeare's works were written by over 30 different authors, all under a collective author name. Or we can use the more sophisticated tools at https://translatedlabs.com/language-identifier: this tells us that most, if not all, of those 268 works are written by the author they give the pseudonym "English EN-GB". Which one is correct?

it is illogical to claim that any writer could have made a living as a "professional" writer by only publishing as few as a single book or staging a single play in a lifetime.

Okay? I didn't make that claim. I would assume that then, like now, many people tried their hands at writing, and at least some of those sold a book or play, or had enough money or fame to get their book or play published, even if it wasn't popular or successful enough that they wrote a second.

1082wonderY
Dic 6, 2021, 6:46 pm

>106 susanbooks: Relax and pass the popcorn. I’m taking odds on how long this can last.

109Taphophile13
Dic 6, 2021, 6:48 pm

Has anyone done a comparison using the I Write Like website described here: https://en.wikipedia.org/wiki/I_Write_Like

110LolaWalser
Dic 6, 2021, 6:55 pm

I'm probably not the only one here who baulks at the amount of iffy (as perceived) premises mooted in this thread. Of course, as a non-expert I'm open to criticism that it's just my ignorance getting in the way--and perfectly willing to accept it when that's the case.

Some of the things that are unclear to me:

--how is "style" defined for the purposes of CL (computational linguistics) analysis?

--is CL analysis accompanied by other, traditional forms of analysis, such as, er, comprehensively reading the text? I don't follow the field of AI research much, but l think that it's still the case that computer programmes are better at counting words than understanding their meaning.

Another reason for asking this is that I was struck by this >91 faktorovich:

Now that you mention this point, I looked into it briefly by reading a Guardian article on this topic. The male names that initially appeared on the "Bronte" novels were “Currer”, “Ellis” and “Acton Bell”. Curiously, Arthur Bell Nicholls (Charlotte Bronte's widower) was named as the person who sold "Brontes'" manuscripts in 1895 to a "literary forger" called Thomas James Wise. The echo between Nicholls' middle-name and the three initial male names' last name is not likely to be accidental.

I may be misunderstanding (sort of hoping I am), but does this mean that until reading that Guardian article, the poster didn't know the information she copied over? If so, that seems a strange position from which to undertake this study, which after all goes to negate a great deal of the subjects' biography.

One of other confusing points is the assumption about shared style--that it's all or nothing, or that style doesn't change over time or may not change deliberately etc. (it's a whole host of questions really).

Related to that, I'm confused by the weight placed on such ubiquitous expressions as "I can't" and "I did not". I realise that these are just two of the six "markers" of style mentioned and that perhaps I'm not appreciating some synergistic result when all six markers are considered. Nevertheless, the fashion in which even just "I can't", for example, has been highlighted encourages me to wonder: since there are only two forms of this phrase, contracted and uncontracted, doesn't that from the get-go impose a terrific reduction on the number of "styles" that can be discerned based on it? Again, I realise there are a few more markers involved, but I don't know whether they are of significantly greater complexity.

Finally (I'm sorry for the length but I did try to limit the number of points to bring up!), while this proves nothing in itself, I was so struck by the similarity between this result and one instance from my own field that I think it's worth mentioning. Decades ago when sequencing whole genomes had taken off we had a lot of work calibrating methods for sequence comparisons. And I remember very well the excitement when we received some new software and started feeding it our data, looking for genes that fit our clones. Thing is, we started getting amazing hits right from the start--literally the first sampled genome turned out highly matched... and the second... and the third...

Long story short--at a sufficiently low level of discernment all DNA is the same, and ravens are no different to the writing desks.

The tale of hundreds of works previously attributed to dozens (or is it hundreds) authors getting reattributed to just six; and the three Bronte sisters getting squished into two (men at that), can't help but seem to me another example of absurd reductionism through computing.

111prosfilaes
Dic 6, 2021, 7:11 pm

>105 faktorovich: One example concerning the Brontes from my tests is that "Anne Bronte's" 2 tested novels both had exactly 12 exclamations per 100 sentences, obviously indicating a distinct writing style; but the novels of "Charlotte" and "Emily" varied between 8, 9 and 19 depending on their topics. ...

I enjoy a good data dive, trying to pull facts out of a morass of limited information. But I try not to be so cocky as to assume that my answers are the end all and be all of anything. My biggest problem with that argument is that it's ungrounded; show me Twain and Dickens and Bulwer-Lytton, and if your tests can separate them, then I'd find that at least a stronger argument. I suspect, however, that authors' styles vary enough to make such tests questionable.

1122wonderY
Dic 6, 2021, 7:13 pm

>110 LolaWalser: **ding, ding, ding!**
Points to the lady. This digitized analysis appears to be the way the protagonist “reads” books. Have you seen how broad her subject matter of published opinions is? I’m not sure thoughtful reading of everything she claims to have read is physically possible without a time machine.

113Petroglyph
Dic 6, 2021, 7:24 pm

>104 faktorovich: "The Good Omens method is not at all explained in that article, and the software that would have been used can be assumed to be very expensive and only accessible through special privileges to insiders. Thus, this Omens method is entirely non-reproducible. Whereas my method is reproducible by anybody in the public for free."

False. Done in R, with free libraries. The blog post says "Using a training set of texts by Pratchett and Gaiman, I used the R package Stylo to analyze Good Omens. (Specifically rolling nsc classification with 50 features and 5000 words per slice)." Here is an in-depth write-up of the contents of this package (direct link to the pdf).

How can it be you are so unfamiliar with R that you "[assume] it is very expensive and only accessible through special privileges to insiders"? Like, seriously, how is it you are unfamiliar with R? That is an honest question, and I would appreciate a straightforward answer. No blame-shifting, no excuses, no comparisons to paid software packages or to your own. How is it you are unfamiliar with R?

You must see that your unthinking reaction of tarring a bog-standard programming language that is famous for being absolutely free and that is routinely used for stylometric analyses with unfounded accusations of "very expensive" and "only accessible through special privileges to insiders" is not a good look, right?

>104 faktorovich: "There is no difference I could find between Iain Banks and Iain M Banks, as this seems to be referring to the same person"

They are different pen names of the same individual, yes, who split his output (litfic and SF) across the middle-name-less and the middle-name-ful one, respectively. This is an irrelevant objection, though. You were asked in >101 Aquila: "I also want to know how many false negatives the system gives when looking at books we know are by the same person. Can it distinguish between Iain Banks and Iain M Banks? What does it say if it can?"

Well, can it? If the algorithm could, what would that mean? If the algorithm couldn't, what would that mean? You can't just assume that "There is no difference I could find" when you haven't even tried to analyze the bodies of work!

Really, we'd like to know if your algorithm is reliable, before we are even willing to accept your complete re-writing of multiple centuries of authorship attribution. So let's put it to the test: how does the algorithm fare when applied to the body of work of a single, undisputed author?

Furthermore, let's assume for a second that post-1926 English-language publications are off-limits.

Have you tried out your algorithm on, say, Ulysses? This text is readily available, in that good old plain-text format that is so very very suitable for machine reading. We also know that this text is by a single author, James Joyce. Uncritical applications of stylometrics to this text can produce the wrong results, namely that i was written by multiple different authors (I suppose you would call them ghost-writers). We know that that is incorrect: Joyce deliberately employed multiple styles throughout this book. Instead, this means that stylometric models can be wrong if they give massively unexpected results and must, therefore, be used judiciously.

How many authors does your algorithm say this text is by? Is the author different from the one(s) who wrote Dubliners? From A Portrait of the Artist as a Young Man? From Exiles? Is there a single author behind these texts? Multiple? How are they distributed? What does your algorithm say about the "contributions" of various ghost-writers (if any) across these texts?

If your algorithm finds multiple authors -- does this mean that multiple people ghost-wrote Ulysses? If your algorithm finds a single author -- how much leeway is there within a single author's voice for this to be the case?

One present-day author whose works we know beyond a reasonable doubt that you have direct non-copyrighted access to is Anna Faktorovich. Can you apply your algorithm to yourself? You've produced several volumes of both fiction and non-fiction, i.e. a varied body of work. If you apply your algorithm to your own works, what are the results? Are your works by the same person? How many ghost-writers (if any) are present in your own non-fiction? In your fiction?

114faktorovich
Dic 6, 2021, 8:22 pm

>107 prosfilaes: As I explained even in my brief interview response, my method includes self-checks within the basic process because it includes 27 different linguistic tests or tests for different types of linguistic measurements. The combined table of matches vs. non-matches indicates which texts in the tested corpus are similar to each other and thus share a linguistic-signature, and which are different from these and form their own linguistic signature groups. Because there are 27 quantitative tests, at least around a dozen of them have to come up with the same attribution answer for a match between texts to be established; each of these tests is equivalent to a re-verification alternative method beyond applied to the same corpus. Once I analyze the data and separate the texts into linguistic-signatures; then, I do not trust any one of the bylines in a given group, but instead test all of them on their established biographical facts to consider which of them are the most likely potential ghostwriters/authentically-bylined-writer, vs. pseudonyms or another category. While at the start of this study of the Renaissance there were many uncertainties, all of these wonky elements were resolved by the time I finished writing the 698 page book, and expanded the corpus to 284 texts; you really have to read the whole book to see why the final conclusions are entirely un-wonky.

In the Conan case. One of the most difficult parts I found in reaching the final conclusions in my study was distinguishing the underlying ghostwriters from ghostwriting-contractors/pseudonyms etc. In case of most modern writers, it is extremely unlike that a pop byline can be a pseudonym without a real person behind it, or this information would have become public. It would also be difficult to trace the finances of any modern author (Andrew J. Offutt included) to determine if there is evidence there of money coming in or out related to ghostwriting services. This is precisely the type of information that I found to confirm my findings in Renaissance financial documents (such as "Henslowe's Diary") that have been digitized. My point about these Conan writers is that it would be far more difficult for any group of several writers to share a similar Conan style, and it would be very easy for a single ghostwriter to write the full series while selling bylines to multiple ghostwritten-byline purchasers. I certainly cannot guess without any research who the ghostwriter vs. ghostwriting-purchaser in this group were.

I tried to use your TranslatedLabs link to check one of the texts as an experiment, and I received the error message that I have reached my data-limit for the day without testing any previous texts; so this tool does not work. I have no idea what 268 texts you are referring to, or why TranslatedLabs has attributed around 268 texts to EN-GB. It is possible that these are the over 200-plays that Percy claimed to have written across his professional career under one of his pseudonyms.

This was a period when the English language mutated by multiplying in size from Middle English into Early Modern English. The Workshop (including Percy under bylines such as "Shakespeare") made up so many new words, or adopted so many words from foreign languages that their texts might have read like what they were a combination of multiple foreign languages to most of the public in England that was not yet taught this new language. It is thus absurd to imagine there were even the brightest students who managed to learn this enormous volume of new textual materials etc. to create an extremely specialized and formulaic play, and then never wrote or published again (since there are few closeted unpublished plays from these decades).

115melannen
Modificato: Dic 6, 2021, 8:35 pm

>113 Petroglyph: >>Like, seriously, how is it you are unfamiliar with R?

Reading through this entire thread, this is what I'm really stuck on! How is it even possible to do that much stats work and not know what R is? I too am super curious.

That's like, idk, claiming to have discovered evidence that the Romans discovered America and not knowing what Latin is.

116faktorovich
Dic 6, 2021, 8:45 pm

>109 Taphophile13: I Write Like is not a scholarly attribution method. 1. Nowhere on this website is there any statement regarding the types of elements the site tests for to determine what a text is like. 2. Its Wikipedia page basically says it only tests for vocabulary. I tested one of my 284 old-spelling texts that uses Early Modern English vocabulary, and it determined that the text is like Agatha Christie. The result did not include any information about the words that were similar between this text and Christie. Thus, it is entirely possible the result are: 1. entirely random without any vocabulary or anything else actually being tested, or 2. the tests are just for a few rare words picked at random from famous authors like Christie. This tool almost appears to have been intentionally designed to prove computational-linguistic attribution is impossible because its results and its lack of stated methodology have rightfully led users to criticize that it leads to nonsensical results. Unlike this approach, my method is entirely transparent, reproducible, and leads to consistent results that accurately attribute authorship.

117aspirit
Modificato: Dic 6, 2021, 9:02 pm

The source code for I Write Like is linked to the bottom of the input page, if anyone is curious. And >109 Taphophile13: yeah, I've played with multiple times over the years. The first time, I learned about authors whose work I have since enjoyed. It's a fun program.

118aspirit
Modificato: Dic 6, 2021, 9:03 pm

Who else remembers the old gender guessing algorithms than were so frequently wrong for published writing?

That was one of the first things I thought of when this topic was introduced. That, along with all the authors who turn out to be a group of authors for each byline, along with certain types of online warriors who believe only men write "real literature", along with how an author can deliberately change style markers to match or contradict themself or other authors, along with much publishing editors used to determine style,....

119faktorovich
Dic 6, 2021, 9:19 pm

>110 LolaWalser: You are right in questioning the definition of "style" in computational-linguistics because it has rarely been clearly defined. For the purposes of my study, I judged "style" based on 27 tests that measured: punctuation, lexical density, parts of speech, passive voice, characters and syllables per word, psychological word-choice, and patterns of the top-6 words and letters. Style can also refer to literary concepts such as metaphors, or variant social meanings of language, or various other things that are not quantifiable in a simple table that can compare hundreds of texts to each other in 1s and 0s. The number of different punctuation marks, passive voice and the other elements I measured can all be broken down into single-numbers (percentages or per-100 sentences etc.). Therefore, when I state that there are only 6 linguistic styles in the 284 texts I tested from the Renaissance, I am not referring to something vague and uncertain, but rather to the mathematic facts in these texts' data.

My method is accurate in contrast with previous approaches in a large part because I have combined the quantitative analysis with various other steps that analyze the structure and contents of these texts. One clue of the uniqueness of my approach is that I have translated 12 volumes of previously untranslated Renaissance books as part of this series to provide additional attribution proof in the annotations. No previous computational-linguist has also done any translation of the texts they are computing. As I mentioned earlier, I relied on publicly-available free software to count the number of punctuation marks etc. in these texts. The innovation in my method is combining 27 different tests, and adding various other verification approaches on top of these.

As I explained, my forthcoming Journal of Information Ethics article only briefly mentions the Brontes in a group of 21 different texts. The central argument in that article is disproving and showing the errors that were made in a previous computational-linguistics article on "Unmasking" by a group of researchers. I did not perform handwriting analysis on the Brontes handwritten texts because this would have added at least a few thousand words to the article, and I would have had to add similar handwriting studies for all of the other 18 texts, expanding this article into a book.

Percy's first plays and interludes that he published in his youth in 1584-5 match his latest plays such as "Captain Underwit" that was published a year after his death in 1649. While there are more formatting/ spelling errors and other glitches in his earliest experiments, his style (and that of all of the other authors I tested) does not change significantly enough across a career for these texts not to be clearly identifiable as of a single-style with my computational method.

The 3-word-phrases (the top 6 most-frequently-appearing of these out of all possible 3-word phrases in a given text) I pointed out as revealing in the Brontes' case are not one of the 27-quantitative tests involved in the basic method. I collected these 3-word-phrases for all texts and used them to find obvious patterns of phrases that only appear in the work of any given authorial-style. The use of a contracted vs uncontracted "I can't" is a significant stylistic divergence that is revealing when one of these appears among the top-6 most-common phrases because this means there are many instances of this preferred usage in the text vs. the alternatives. I did not use any 3-word-phrase patterns to establish attributions, but just used them in the writeups to show that the less easily understood mathematic matches vs. non-matches were also confirmed by these verbally-descriptive elements.

I can see how you perceive this to be the case, but I started working on this project around 3 years ago. I saw glitches like the ones you encountered first as I wrote the 300,000 words on the 18th century, and gradually I adjusted my method, and did more research until I identified the glitches and the strands of "DNA" were clarified and finely defined. And then I wrote the 698-pages in Volumes 1-2 of this study after expanding the corpus from around 100 to 284 texts. I expanded it not for the sake of checking more texts but because I needed more data to check potential glitches. There have not been any previous computational-linguistic studies of the British Renaissance that tested 284 texts, so my findings are the most precise and deeply thought-through results in this field. There are 104 bylines in my corpus that I am re-attributing to 6. I have proven the Renaissance 6-ghostwriters case in this series of 14 volumes, whereas I have only written a couple of pages about the Brontes. I don't know why you guys are more mesmerized by the magic trick of three women turning into two men, but I would have to test hundreds more texts and perform years more of research to reach the same level of certainty about the Brontes as I have already reached about the Renaissance. Even if the 6-ghostwriters theory is more difficult to believe of the two, it is the one for which the evidence I have collected makes it an established fact.

120paradoxosalpha
Dic 6, 2021, 9:23 pm

>109 Taphophile13:

It says I write like Dan Brown, which makes me sad.

121Taphophile13
Dic 6, 2021, 9:26 pm

>120 paradoxosalpha: So sorry. That would make me sad too. Perhaps a different test paragraph would help.

122amanda4242
Dic 6, 2021, 9:26 pm

>109 Taphophile13: I tried it with three reviews I've posted here, and I got Kurt Vonnegut, David Foster Wallace, and Margaret Atwood.

123Taphophile13
Dic 6, 2021, 9:28 pm

>122 amanda4242: It seems you are very versatile.

124faktorovich
Dic 6, 2021, 9:56 pm

>111 prosfilaes: Since you asked, I upload 3 texts by these authors from Project Gutenberg: Bulwer-Lytton - Godolphin; Dickens - Oliver Twist; and Twain - A Tramp Abroad. I ran them only through Analyze My Writing's Basic tests. There are obvious divergences that prove these are texts by different authors. 1. Punctuations: Bulwer (173 commas per 100 sentences), Dickens (191), Twain (137). There was a bit of similarity between Bulwer and Dickens on questions and exclamations, but they were still different from each other, and Twain was very different from both of them. But Dickens and Twain were more similar to each other on their use of semicolons, yet still were a point different. Even if I ran these 3 texts through all of the 27 tests the results would not be conclusive because at least 20 texts should be in a corpus for the 18% similarity vs 82% divergence rule to apply. While the differences between 10 question marks in Bulwer and 11 in Dickens seem small with only 3 texts in the mix, if 100 texts were measured, several of Dickens and Bulwer's texts might prove to have exactly 10 vs 11 measures, and this would confirm that these are distinct linguistic styles. The commas measurement is uniquely divergent between these 3, so if I only referred to it, it would make this simple test seem like it is enough to establish authorship. But it is too easy for a glitch or an abnormality to creep into any one of these tests that is corrected when the corpus is big enough and the full range of tests is applied, accompanied with additional verification approaches that take a few days to carry out. Let me know if you want me to make the full experiment on any group of texts, but there should be a logical reason for this experiment, other than curiosity. The data on my GitHub already establishes the accuracy of my method on two different types of groups of texts.

125prosfilaes
Dic 6, 2021, 9:56 pm

>116 faktorovich: my method ... leads to consistent results that accurately attribute authorship.

How do you know that? You've told us about the results that contradict what was previously believed; you've told us nothing about the results that establish that your method works.

126faktorovich
Dic 6, 2021, 9:58 pm

>112 2wonderY: If you calculate the number of words I write per-hour and how much research I have done just to respond to this thread of texts, you will figure out that it is indeed possible for me to read about, research and to publish about the range of different subjects I have covered.

127paradoxosalpha
Dic 6, 2021, 10:03 pm

>121 Taphophile13:

While my published fiction is "like" Dan Brown (although not as remunerative as his), a sample from one of the book reviews I wrote last month is "like" H.P. Lovecraft, which is more satisfying.

128Taphophile13
Dic 6, 2021, 10:10 pm

>127 paradoxosalpha: That's a much better result. My own reviews are like Arthur C Clarke whom I haven't really read.

129faktorovich
Dic 6, 2021, 10:39 pm

>113 Petroglyph: You have linked to an article about "R package Stylo" and not to a software that can be used to test texts. This article explains that this package uses basic n-gram or word and character analysis on texts. The most common words and characters are only 2 of the 27 tests that I applied to my corpus, so my tests are far more complex and thus accurate than this R approach. I only have access to TexShare in my rural region of Texas, as I do not have institutional support for my research; so I could not access this package even if it is available at some research universities. Most of the public is in a similar boat, and does not have access to these types of institutional resources, and thus cannot check this method for themselves. In contrast the tools I cite are publicly accessible from any computer with an internet connection for free. By "50 features" you mean you analyzed only 50 semi-frequent words out of these texts. I explain why only testing a few selected words starts any study with extreme bias that cannot lead to trustworthy results. What your write-up lacks is even the specific full list of the 50 features you have selected, why they were chosen, and the raw data you generated before you turned it into your seemingly neat graphs. The R programming language is not relevant to my analysis because the basic tools for counting the number of punctuation marks etc. have already been programmed and are available for the public to use. If I had used a programming language to test the texts I analyzed; then, I would have also not been able to give links to the public to free websites where they could double-check my results. Alternatively if I had developed my own website that allowed users to perform the 27 tests simultaneously, this would have taken me years that I have instead spent on writing the 14 volumes of this series; there was no need for me to reproduce existing simple linguistic counting programs like Analyze My Writing. My goal was establishing the correct attributions of the Renaissance, and not selling a computer program I designed.

You have explained that "Iain Banks and Iain M Banks" are 2 pen-names of the same person, so checking the texts by this author against each-other is as rational as checking 2 texts by any established byline if the author happened to use a middle initial in one of them. There is no mystery to solve here unless you are saying you doubt these two Banks are the same person; this can be the case if somebody else wanted to benefit from Banks' status and used a similar byline to fool readers into interpreting the texts as being by the famous guy.

Sure, I can test a single undisputed author. I would recommend testing Dickens or Twain, since both were author-publishers and I anticipate their works will show clearly a single authorial style, and both of them have written dozens of digitized texts that can be tested. Go ahead and name the specific challenge and I'll proceed with carrying out the experiment.

To find how many underlying authorial-styles are behind Ulysses, I would have to test a few of Joyce's other texts and dozens of texts of other authors published in James' circles/ by the same publishers/ in the same genres. Intuitively the style in Ulysses is extremely different from "Dubliners" and "Portrait", so if my tests indicated different authors wrote these texts, it would not be unexpected from my perspective; if the styles do match, this would confirm that my method can identify an authorial style even if texts seem to be extremely different intuitively. You do not know if Joyce truly wrote Ulysses himself, or if he used different styles or if this was a text that combined contributions from multiple writers that were stitched together. The tests would only indicate which segments of Ulysses were by different writers if there were around 100 texts in the corpus that included all of the potential ghostwriters with multiple samples from each. To explore this question fully, I would have to write an entirely new book. So, instead of sending me on this Ulyssian journey, why don't you instead start by taking me up on the offer to read my finished Renaissance series for free by emailing me to request a copy at director@anaphoraliterary.com.

Sure, I can apply the tests to myself. You should know that I have done professional ghostwriting in the past, though I cannot disclose for whom. You would have to test millions of texts with around a million bylines to figure out what I have ghostwritten, as it is only a few projects in a sea of modern publishing. It would be less extreme for me to just test the texts with my own byline, but only testing my own tests against themselves would once again fail to establish what they are different from. For every text by me, I would have to add a few texts by other bylines writing in the present (and again I try to avoid testing anything published after 1926). Testing works by any single given author produces a set of numbers that represent the range of that author's style; only when this style is compared with other styles do these tests indicate authorial attributions with my method.

130Petroglyph
Dic 6, 2021, 10:39 pm

>115 melannen:
I don't think she does stats!

Well, averages and characters per words, yes. But I don't think she applies even t-tests or chi-square tests. How statistically significant are the adjectives-per-thousand-words between the ghost-writer behind one Brontë and the other?

Instead of using software and performing the tests herself (word counts, POS assignation, ...), she copy/pastes texts into online text analyzers, and then copy/pastes their output into a spreadsheet. And going by the interview, where she encourages people to copy her workflow, part of the data treatment consists of manually changing everything within 17-18% similarity for one text on a given test result to a 1, and everything else to a 0. And to do so until all the percentages have been changed into ones and zeros.

In other words, she's manufacturing the data. Changing percentages of similarities (as spit out by online generators) to either 1 (similar) or 0 (not similar at all). No wonder her results are so different from everyone else.

As I said back in message #11: Garbage In, Garbage Out.

131faktorovich
Dic 6, 2021, 10:48 pm

>117 aspirit: I read through some of the source code. Christie is only mentioned in a long list of author-photographs. Most of the code includes messages uses receive while using the program. What is needed instead of all this programmer-speak is a simple set of linguistic rules that the program uses to establish what a given text is similar to.

132faktorovich
Dic 6, 2021, 10:50 pm

>118 aspirit: I explain the gender bias that has led publishers to prefer men with linguistically-dense styles for mystery-writing, and women for linguistically-light styles for romance-writing in my previous book "Gender Bias in Mystery and Romance Novel Publishing".

133faktorovich
Dic 6, 2021, 10:53 pm

>125 prosfilaes: I have explained that I researched handwriting analysis and documentary evidence that have confirmed my findings. You can see my articles where I include my handwriting findings on the series' page: https://anaphoraliterary.com/attribution. Once again, just ask me to send a review copy to you of the entire series, and you can see for yourself how I have confirmed my central computational-linguistic method with various other types of evidence.

134melannen
Modificato: Dic 6, 2021, 10:59 pm

>130 Petroglyph: But *I* do stats for fun by copy-pasting things spat out by online analyzers into spreadsheets and screwing around with them without ever quite understanding chi-square. Once I even proved that the Voynich manuscript was a map of Oak Island! But I at least know what R *is*, even if I'm too lazy to use it.

>129 faktorovich: R is free software. Anyone can download it and use it for free, anywhere in the world. It's available here: https://www.r-project.org

Stylo is a free package for R. It can also be downloaded and used for free, by anyone in the world. It's here: https://cran.r-project.org/web/packages/stylo/index.html

They're both fully documented, and are also GPL licensed, which means you can even look at the programming that makes them work, to make sure it's doing what it says it does.

They are a little bit less intuitive to use than tools like Excel, but for someone who is doing groundbreaking statistical research it's definitely worth the effort to use the many free online tools to learn how to use them!

135faktorovich
Dic 6, 2021, 11:04 pm

>130 Petroglyph: I have a section in the Renaissance book where I test the mean results (t-tests/chi-square) in the data against outliers and check random statistics against the unique results I received in this study. I find in this section that my results cannot be accidental, but rather must indicate blatant linguistic distinctions between the authorial styles.

I don't think you understand what it means to run a counting-test on a text. It does not matter if I have invented my own exclamation-counting program or used a free public website to derive the answer - the results are the same - the system just counts the frequency. Obviously I have to copy the data from the software program into my spreadsheet if I need to run comparative tests on this data. I also don't think you understand that in computer languages "1" means something like "yes" (in this case "similar") and "0" is like "no" (or in this method "different"), so by changing the data into basic "similar" vs. "dissimilar" outputs based on their degree of similarity or difference, and then adding up the number of these "1" (similar) results, I establish the precise degree of similarity between each pair of texts in the corpus for a possible result between 0 and 27 or an enormous range of similarity variance. I am not manipulating data, but rather recording the 0 when texts do not fall within 17-18% of each other, and a 1 when they do fall within this range. Everybody else does not make the data available, but my GitHub page has all of this data, including the raw data from the tests with the different numbers and then you can see in other tables which of these numbers I have changed into 1s and 0s.

136faktorovich
Dic 6, 2021, 11:16 pm

>134 melannen: Again, the R programming language is designed for folks to create statistical tools that count the number of commas etc. These programs have already been developed and are available for free online, so it is nonsensical for anybody to write new programs that repeat this completed labor. The Stylo package you linked to does not have any practical functionality accessible to the public that would actually allow for linguistic testing of texts to determine authorship. When I was researching what method I should use, I considered not only these tools by dozens of other tools advertised in previous computational-linguistic articles. After downloading a few of these programs and experimenting with them; I found most do not function properly, many are not really free available or have pay-walls to actually apply them. If you disagree, post a set of simple steps (in 1 paragraph) like the ones I gave for my method that the public can use to test their own texts with Stylo. If you cannot summarize the steps in 1 paragraph; it is not a method that can be tested without specialized knowledge, and thus it is only for insiders, and this makes it very easy for specialists in this field to manipulate data because their results cannot be checked or clearly understood even by advanced literature research specialists, and probably not even by computer programs in other fields. There is nothing uniquely special in using software that the public cannot use or understand. Excel or any spreadsheet is enough for my method; and because of this even a middle-schooler can use it to test if his classmates are buying papers from a paper-mill. Solving the intended authorship mysteries is the goal of my method and not making it sound unapproachably difficult and incomprehensible.

137Petroglyph
Dic 6, 2021, 11:20 pm

>135 faktorovich:
No, you change a degree of similarity into complete identity ("1" / yes) or completele difference ("0" / no), thereby obliterating the finer distinctions. What's your reason for enforcing this 17-18% cutoff where not similar (0) becomes similar (1)? Why those two figures? Why not 19? Are there cases where you count everything within 18% range as similar, and other cases where you cut things off at 17%? Why?

Adding up the ones to get to a similarity score presupposes that all of your 27 tests have identical weight. Surely, word density or sentence length are more telling than whether an i or an s is in the top six characters?

138Petroglyph
Modificato: Dic 6, 2021, 11:36 pm

>136 faktorovich:
Oh my god. Stats are too hard for the average person, therefore her method, which completely sidesteps mathematical constraints designed to remove bias, is to be preferred.

"The Stylo package you linked to does not have any practical functionality accessible to the public that would actually allow for linguistic testing of texts to determine authorship."

That is exactly what that package was designed to do. To vastly superior degrees of sophistication than your handcrafted spreadsheets.

"If you disagree, post a set of simple steps (in 1 paragraph) like the ones I gave for my method that the public can use to test their own texts with Stylo. If you cannot summarize the steps in 1 paragraph; it is not a method that can be tested without specialized knowledge, and thus it is only for insiders, and this makes it very easy for specialists in this field to manipulate data because their results cannot be checked or clearly understood even by advanced literature research specialists, and probably not even by computer programs in other fields."

If you start from a position of basic distrust of specialists, then a position of distrust is exactly where you'll end up. Some of these methods rely on linear algebra to condense an n-dimensional space (say, for the n most frequent words in a corpus of texts) into a two- or three-dimensional scatterplot. One commonly used method to do this is principal component analysis (PCA).

If any of these words frighten you, you can't just throw together your own method and assume you'll get equivalent or better results than the ones specialists developed to handle multi-million word corpora and to avoid erroneous interpretations.

(edit: a word)

139Petroglyph
Dic 6, 2021, 11:35 pm

>134 melannen:
Nothing wrong with that! Tinkering with things teaches you about them. And at least you're not claiming to have decrypted the Voynich manuscript, nor are you demanding to be taken seriously.

140Petroglyph
Modificato: Dic 6, 2021, 11:41 pm

>129 faktorovich:

Read the relevant portion of my message in >113 Petroglyph: again, but more slowly. R is free. Anyone who learns this programming language can use it. No institutional access is necessary. Just a computer. Which you have.

"I could not access this package even if it is available at some research universities"

Download R. It's free. Probably the GUI app Rstudio, too. That one is free too, and it makes things easier. Then, at the R command line, type this:

> install.packages("stylo")

Boom. done. Just like any other R package.

"basic n-gram or word and character analysis on texts"

Lol. It uses so much more. Read the pdf instead of skimming the initial paragraphs. Your home-brew algorithm with 27 tests is no match for the thousands of comparisons that the various functions in this package can run on your corpus.

And again: R is one of the most commonly used programming languages to perform statistical analyses on large bodies of data. The fact that you are unaware of its existence means that you do not know what you are talking about. You've been tinkering on your own, away from the community of digital humanists and stylometry people in particular. And you are unaware of basic tools used in the business. FREE tools.

"You have explained that "Iain Banks and Iain M Banks" are 2 pen-names of the same person, so checking the texts by this author against each-other is as rational as checking 2 texts by any established byline if the author happened to use a middle initial in one of them. There is no mystery to solve here unless you are saying you doubt these two Banks are the same person; this can be the case if somebody else wanted to benefit from Banks' status and used a similar byline to fool readers into interpreting the texts as being by the famous guy."

The question was if your algorithm would correctly identify Iain Banks and Iain M Banks as one author. As a test for your method. One pseudonym was used for litfic, the other for SF. How different are these two bodies of work from each other? We don't know, but you claim to have an algorithm that does author identification. You also claim your method works. If you test all but one of his litfic books, and all but one of his SF books, would your algorithm correctly predict that those final two books were written by the correct pseudonym?

"There is no mystery to solve here"
No, indeed. We know these two bodies of work belong to one author. This is to test the reliability of your algorithm. If your algorithm can be applied to "mysteries", you have to show first that it works on non-mysteries.

"To find how many underlying authorial-styles are behind Ulysses, I would have to test a few of Joyce's other texts and dozens of texts of other authors published in James' circles/ by the same publishers/ in the same genres."

False.
You could chunk ulysses in several thousand-word bits and run your tests on those. See how likely it is that they are by the same author. Then compare the other Joyce texts using the same chunking size. Or compare the various chapters of Ulysses and see how many of those cluster together as written by "the same author." That would give you an idea of how many different "author signatures" are at work here. Or not.

If your algorithm only works to find other authors from your corpus in a given mysterious text (let's call it X), then of course that is what your result is going to be. If your algorithm explicitly tries to identify contemporaries of the author of text X as the ghost-writers, and then of course you are going to find them.

Authorship attribution algorithms are designed to correctly attribute Text X to potential authors. But they can also provide an author's profile, given a body of work that is attributable to author X.

"To find how many underlying authorial-styles are behind Ulysses (...) The tests would only indicate which segments of Ulysses were by different writers if there were around 100 texts in the corpus that included all of the potential ghostwriters with multiple samples from each. "

This is the crux of your problem. This, right here. Your method assumes that a shockingly low degree of similarity between texts (based on a mere 27 characteristics, mind you) is evidence of the author of one contributing to both. Your method finds ghost-writers because that is all it is looking for. Your method does not create author profiles based on one author's body of work, as you yourself admit. Your method only works if you compare two or more authors with the explicit purpose of assigning one text to another. It treats genre similarities as authorial signatures. It looks for similarities between texts, and ghost-writing is the only explanation it allows for.

Your methodology is broken. Genuinely broken.

141melannen
Modificato: Dic 6, 2021, 11:47 pm

>136 faktorovich: My confusion here is not that you aren't using R in your work - I agree, I'm also too lazy to use R when I play with stats! - but that you weren't familiar with what it was or the fact that it is (100%, totally, completely) free (for everyone) (with no hidden charges).

Anyone who has done even minimal study of statistics should be aware of R even if they choose not to use it. Since it's completely free, it's the most commonly used software for statistics, and it's used in most introductions to statistics - I was introduced to it in a beginner course on stats for non-math people.

It's very hard for me to understand how someone could have been as involved as you are with statistical work without immediately recognizing what R is when it was mentioned! Even if you've never had an opportunity to take a modern statistic class, if you'd made any attempt to review other people's work in the field you would have seen it constantly mentioned, too. Like I said - it's like somebody doing research on Rome who doesn't know what Latin is. I don't expect you to speak it, but if you don't even know that it's a language, it's hard for me to trust that you actually know anything about Rome.

142melannen
Dic 6, 2021, 11:47 pm

>139 Petroglyph: Oh I have definitely decrypted it! LIke I said it's a map of Oak Island, where the secret original manuscripts of Shakespeare that were written by Roger Bacon are hidden. Elizebeth Friedman solved it first but the FBI stopped her from sharing her solution, it's all encoded in the folk song "The Schooner I'm Alone."

(I am absolutely definitely not taking it seriously. If I was I'd be using R.)

143Petroglyph
Modificato: Dic 6, 2021, 11:57 pm

>110 LolaWalser:

I can answer some of your questions.

"How is "style" defined for the purposes of CL (computational linguistics) analysis?"

For identifying an author's "style", you take some of that author's undisputed works (at the least several tens of thousands of words, though methods exist that can leverage smaller quantities of text), and then apply various tests. Here are a few common ones: the n number of most frequent content words (nouns, verbs, adjectives, adverbs); The n number of most frequent function words (pronouns, prepositions); Parts of Speech (i.e. how many nouns, verbs, adjectives, adverbs, prepositions, pronouns, etc); average word length; average sentence length; and vocabulary richness (the proportion of different words to total words -- some authors use a more limited vocabulary range than others). You can use strings as well: the n number of most frequent 5-character strings, or 3-word strings (or more).

A combination of all this can be used as a shorthand for identifying an author's style. Not in the sense of deep-reading their texts, but in the sense of how this particular author uses the language they write in. It's a mechanistic approach that looks at features that are hard to fake. Especially the sequences of three characters (or five, or whatever) -- there's no real way of faking that. People just don't pay attention to that sort of thing when producing meaningful texts and arranging information in accessible and/or pleasing patterns.

Note that I separated content words from functional words earlier. Functional words (such as pronouns and prepositions) are often not used in identifying a style, because they pattern differently across genres (a simple example: "I" and "you" are much less frequent in non-fiction texts, so any differences in their distribution across a novel and a tractate cannot be taken as evidence of difference of authors). Obviously, if you're dealing with the same genre, they're fine to include, and they're quite telling.

In order to run these tests multiple times on the same work, texts are often divided into chuncks of x number of words (500, 1k, 5k, 10k are common; accuracy tends to increase with chunk size).

Software packages exist that offer out-of-the-box stylometry tests: once you get your texts in a machine-readable form, and once they have been run through a tagger for Part of Speech, you can point the stylometry algorithm at your corpus and it will automatically perform the segmentation into 5000-word blocks (or whatever size you've specified) and apply a battery of measures to them.

These algorithms are continually refined and adapted: other features are thought up and tried out, and different algorithms apply better or worse to certain types of texts.

"--is CL analysis accompanied by other, traditional forms of analysis, such as, er, comprehensively reading the text?"

Well, these tests are often run on corpora of many millions of words. It's impractical to deep read all of these. Computational methods were designed to deal with quantities of texts that humans could not possibly digest, and to extract patterns that humans could detect only painstakingly and with ridiculous investments of time and effort.

Analysts use qualitative judgements when applying algorithms to texts: identifying one author across different genres requires a thought-out selection of features looked at (or rather: excluding certain features). If two texts compared feature a male vs a female main character, then the distribution of the pronouns "he" and "she" cannot be taken as an argument to confirm or deny authorship, for obvious reasons.

Results have to be interpreted, too, obviously, and this is where qualitative thinking is paramount. The numbers are just that: a measure of similarity along multiple parameters. What they mean depends on the case at hand.

These are some of the qualitative features taken into account in addition to the quantitative analysis.

"One of other confusing points is the assumption about shared style--that it's all or nothing, or that style doesn't change over time or may not change deliberately etc. (it's a whole host of questions really)."

Yup. Writers often change style as they age. I wouldn't be surprised if, say, Austen's later texts were different from her earlier ones.

"I can't" and "I did not"

Yeah. For one thing, Faktorovich limits herself to the six most common three-word strings in the text. That's basically useless. Proper authorship and stylometry studies routinely look at many hundreds of the most common word/phrase/string. I'm not exaggerating. Hundreds.

For another, those three-word phrases are red herrings. By her own admission. She says in >119 faktorovich::

"The 3-word-phrases (the top 6 most-frequently-appearing of these out of all possible 3-word phrases in a given text) I pointed out as revealing in the Brontes' case are not one of the 27-quantitative tests involved in the basic method. I collected these 3-word-phrases for all texts and used them to find obvious patterns of phrases that only appear in the work of any given authorial-style. The use of a contracted vs uncontracted "I can't" is a significant stylistic divergence that is revealing when one of these appears among the top-6 most-common phrases because this means there are many instances of this preferred usage in the text vs. the alternatives. I did not use any 3-word-phrase patterns to establish attributions, but just used them in the writeups to show that the less easily understood mathematic matches vs. non-matches were also confirmed by these verbally-descriptive elements."

So she's looking at the top 6 most frequent three-word strings. No word on if the three-word string in question is number 7 the other Brontë's text, or the tenth. No word on absolute frequencies, either -- is the difference four times versus two? (i.e. basically irrelevant) Forty versus two? (Potentially interesting) And in reality, she's not even looking at these phrases at all! She arrived at her authorship attribution using a paltry 27 criteria. And in order to make the mathy numbers more understandable to her audience of non-specialists, she looked at the top six three-word phrases and used those as shorthand, deciding that they were equally diagnostic as her more mathy results. Again -- no word on absolute or relative frequencies. Just the fact that they are among the top six in one author and not in another (though perhaps in the top seven? top ten?) is treated as corroboration of her "mathy" tests.

It's not top six that is important, it's differences in proportions. And she admits that she hasn't actually looked at the actual differences in proportions for these strings -- only top six vs not top six. It's ridiculous!

144Aquila
Modificato: Dic 7, 2021, 12:21 am

Were books edited in the Brontës' time the way they are now? Because things like contractions can be very dependent on editorial decisions, I could even see a typesetter working off a handwritten manuscript making particular decisions about contractions.

145Petroglyph
Dic 7, 2021, 12:25 am

>142 melannen:
Sounds intriguing. Hit me up if you want me to apply some inscrutable methodology to your data that'll confuzzle the masses in their un-eliteness.

146MrAndrew
Dic 7, 2021, 1:38 am

>122 amanda4242: Oh my heavens, you ghost-wrote all those authors novels? I'm a huge fan!

147MarthaJeanne
Dic 7, 2021, 1:41 am

>144 Aquila: I was thinking that, too. Some publishers have certain words, phrases, spellings that they insist on/don't allow as part of a 'house style'. Whether the printed work uses cannot, can not, or can't is probably a lot more dependant on publisher and typesetter than on the preferences of the author. It's also not something an author is likely to get up tight about. There are plenty of other issues for that.

148Maddz
Dic 7, 2021, 2:31 am

>147 MarthaJeanne: I was thinking the methodology is more likely to be identifying scriveners or clerks making fair copies for the printer's use from the original manuscript complete with spelling errors, amendments, tiny text (to fit more words on the page), interpolations and annotations and so on. In other words, it's the publisher taking the author's working copy and turning it into the printer's working copy.

It would then make sense that it would look like 6 ghostwriters on the grounds that this would be consistent employment for a clerk, who could be working on multiple manuscripts throughout a year.

149spiphany
Modificato: Dic 7, 2021, 3:18 am

>144 Aquila: I also have some big questions about how standardization of spelling and punctuation, or rather lack thereof, is being taken into account in the analyses.

I work as a copyeditor, and my experience is that writers -- even writers working in this modern age of computer word processors -- aren't particularly consistent. It seems to me that some elements of her analysis (questions of word choice, etc.), would need to recognize that "traveller" and "traveler" or "&c" and "etc." are the same words, whereas other ones (identifying writerly idiosyncracies) might need to treat them as different.

This is particularly relevant since she is working with texts that were written at a time when the written language was treated as rather more fluid than we do today, and as a result the spelling and punctuation choices of 17th-century authors are likely to seem somewhat arbitrary to contemporary readers. Anyone who has read a non-modernized version of Shakespeare or Chaucer will have noticed this.

On top of this there are issues of textual history and editorial preparation of texts, which may further obscure the "signatures" that the analysis depends on. In an era in which manuscripts were handwritten and either copied by hand or manually typeset, a lot of variation creeps in, some of which is not author intention but may be dictated by mundane reasons like the typesetter using an abbreviation or a shorter spelling to cram in a word at the end of a line.

When these texts are published today, they therefore undergo editing in one form or another. Depending on the intended readership, this may involve making changes to unify or lightly modernize spelling and punctuation, or it may involve comparing different versions of the text (e.g. First Folio/Second Folio) as part of the creation of a scholarly edition including a commentary that lists variants in the textual tradition and indicates missing/illegible words and similar matters.

It isn't at all clear to me what version of these British Renaissance texts are being used, except that they are versions freely available online. This doesn't tell us anything about the editorial preparation of these texts.

150Petroglyph
Dic 7, 2021, 3:35 am

>149 spiphany:
If you look at her github repo, you'll find her bibliography here. (click on "view raw" to download a .docx file). I see a lot of Project Gutenberg and Early English Books Online.

151spiphany
Dic 7, 2021, 3:42 am

>150 Petroglyph: My point was more that she doesn't mention editions at all when discussing her methods, and this hardly seems irrelevant to me given the era in which these texts were written. "Freely available online via Project Gutenberg" isn't a particularly compelling criterium for selecting which edition to use.

152Bushwhacked
Modificato: Dic 7, 2021, 3:56 am

>70 paradoxosalpha: "I'm just sitting here with a well-buttered barrel of popcorn" - that's the funniest thing I have read in this thread so far!

153Bushwhacked
Modificato: Dic 7, 2021, 5:02 am

>106 susanbooks:: re: " OMG when will this end" ... well said! The lady is certainly intent on defending her thesis! I suspect it will end when the buttered popcorn runs out...

154Petroglyph
Dic 7, 2021, 3:57 am

>149 spiphany:
Ok, I got curious how stylometry papers on Elizabethan plays handle these things and I just looked at this paper: Eisen, Mark, Alejandro Ribeiro, Santiago Segarra, and Gabriel Egan. 2018. ‘Stylometric Analysis of Early Modern Period English Plays’. Digital Scholarship in the Humanities 33 (3): 500–528. https://doi.org/10.1093/llc/fqx059.

(if you encounter a paywall at that link, there's a pdf link here).

Segarra et al. say this on p. 3: "When using original transcriptions we have to account for the fact that many words had multiple accepted spellings during the Early Modern era. E.g., the word ‘of’ is also spelled as ‘off’, ‘offe’, or ‘o’ whereas the word ‘with’ may also appear as ‘wid’, ‘wyth’, ‘wytt’, ‘wi’, ‘wt’, and ‘wth’. Many of these alternate spellings are used infrequently and thus do not contribute highly to the WAN of a text. Nevertheless, we correct the WANs so that the occurrence of any of the alternative spellings is treated as the same word. We emphasize that spelling preferences carry little information about the authorship of a play. Indeed, spellings in printed editions were not necessarily those of authors as they were often selected by printers to accommodate the fixed length of lines in printing presses [33]."

The reference behind [33] is this: Philip Gaskell, A new introduction to bibliography, Clarendon Press Oxford, 1972.

In other words, these spellings are unified -- there's lists of common alternate spellings of function words, and an algorithm replaces the alternatives with the "standard" one. Because they're more the printer's area than the author's, as you pointed out.

155AnnieMod
Dic 7, 2021, 3:58 am

I may have missed it somewhere but all the texts being discussed are the ones which are identified as misattributed.

What texts had been used to prove that the algorithm actually can recognize proper attribution? Surely, anyone proposing an algorithm has done extensive testing with both positive and negative use cases.

So how about sharing 5 pairs/groups of texts from the same time (as they would have undergone the same level of editing and so on during printing) which pass this test on attribution? I cannot imagine that a properly run research has not gone through a lot more than 5 sets of these.

156Bushwhacked
Modificato: Dic 7, 2021, 4:27 am

>6 Crypto-Willobie: Down here in Australia I believe our equivalent colloquialism is "bullshit". Certainly by the intensity of the thread it's "baffling brains". The lady certainly is intent on defending a position that would appear to be under heavy fire! I note she even ripped into Mr Spalding's light hearted response. Perhaps with all the references in this thread to the Bronte sisters we may conclude the good Doctor has a "bee in her bonnet".

157Bushwhacked
Dic 7, 2021, 4:31 am

>9 DuncanHill: ... it's always a good day when I learn a new word! "Nephelokokkygia" ... I thank you, sir!

158Bushwhacked
Dic 7, 2021, 5:56 am

>79 anglemark: I think Erich Von Daniken must be due for rehabilitation as well!

159Bushwhacked
Dic 7, 2021, 5:58 am

Perhaps the good Doctor's theory could be best summed up as "I reject your reality and substitute it with my own".

160MarthaJeanne
Modificato: Dic 7, 2021, 6:11 am

>102 faktorovich: " it is illogical to claim that any writer could have made a living as a "professional" writer by only publishing as few as a single book or staging a single play in a lifetime.

But many people publish one or two books without ever intending to make a living as a writer. They may have spent years writing their one book idea as a hobby. Or thought they might like to be a writer until faced with the realities of how much work is involved after the writing and how little profit there is in it. And even those who still want to be professional writers find that they still need a bread job. This is true now, and I think it was probably true then. If you look through LT, you will find lots of authors only credited with a single book.

161Petroglyph
Dic 7, 2021, 9:20 am

I would like to add, too, that the stylometry package me and melannen talk about (e.g. in >134 melannen:) comes with a sample of full-length novels by different authors, for learning and testing purposes (before users learn to prepare and import their own datasets). From the notes:

This dataset contains a selection of 9 novels in English, written by Jane Austen ("Emma", "Pride and Prejudice", "Sense and Sensibility"), Anne Bronte ("Agnes Grey", "The Tenant of Wildfell Hall"), Charlotte Bronte ("Jane Eyre", "The Professor", "Villette"), and Emily Bronte ("Wuthering Heights")

Somehow the multiplicity of authorial voices in the Brontë novels has gone completely unnoticed in a standard set of test novels that's been used by nearly every user of this package. Curiouser and curiouser!

162melannen
Dic 7, 2021, 12:32 pm

>145 Petroglyph: I will keep it in mind! I have a few thousand more pages of defensive wall o' text to write first I think.

>149 spiphany: et al.: Actually, similar stylometry has been used to try to distinguish between editors/compilers/typesetters and things like which actors dictated some of the worse Shakespeare folios. With handwritten manuscripts (like the Voynich, seriously this time) it's also been used to even distinguish between copyists or different scribal centers.

It would be interesting if this methodology was actually detecting something like different Project Gutenberg copyeditors or Elizabethan print shops' punctuation standards (and hey, typesetters and copyeditors get even less credit than ghostwriters!) But given the description of the methods I suspect any signal it does have is completely drowned in noise and other signals that can't be separated out.

163faktorovich
Dic 7, 2021, 12:54 pm

>137 Petroglyph: My method combines 27 different tests. Each of these tests have different measuring systems. For example some are for the number of punctuation marks in 100 sentences, while others are for the percentage of passive voice. These numbers cannot be compared to each other without changing them into the same types of measurements. The point of the experiment is to compare all texts in the corpus to each other in similarity. So, the goal can be reached by simplifying the data into similar and dissimilar data-points. While in this step the data is simplified, then the 27 tests are combined, creating 27X the precision of merely counting any two texts as being within 17-18% of each other or outside this range. The 17-18% change is needed because in a spreadsheet I select the same number of rows above and below a given data point to choose the similar or proximate texts, so there might be 2 above and 2 below or 5 above and 5 below, whereas precisely 17 or 18% might have meant having to choose between 4 below and 5 above or 5 below and 4 above; with equal number below and above, the percentage they add up ranges between 17-18%. I have found this range to be statistically significant because it is over the 1/5 odds that are pretty difficult to beat in a random roll of the dice. It also simply works best in identifying similar texts when I have applied it to practical experiments. While it might seem that linguistic density is more significant than the top-6 characters, when I have tested this assertion, I have found that either of these tests can either be the best measures of similarity for a given couple of texts, or can have a glitch where they show similarity, but most of the other tests show a divergence. My method works because the combination of these different tests minimizes the impact of even a few glitches on any of these tests from incorrectly changing the attribution.

164faktorovich
Dic 7, 2021, 2:05 pm

>138 Petroglyph: You have avoided answering my question by instead insulting my intelligence. "Statistics" just means the analysis of data. My 27-tests method thus uses statistics as it analyzes data. I simply disclose the exact steps and tools I use in this analysis that makes it replicable by anybody in the public and your methods do not. No, stats are not too hard for the average person. There are online free tools that make even extremely complex statistics easy as you just plug in the numbers and the software applies the relevant formulas. Thus, a method is not uniquely sophisticating if it asks users to use inaccessible software, instead of the software that can solve the same problems with other steps and is available for free.

The explanation you give after accusing me of distrust is nonsensical when one defines the terms you are using. Here's what you're saying: X methods use linear equations (2-dimensions between 2 points) to simplify a corpus tested for frequent-words into 2-3-dimensional two-dimensional (X-Y) plot diagram. You are thus incorrectly defining the steps involved in PCA, which basically standardize the data from multiple dimensions into a simple 2-dimensional plot can allows for all types of comparison to be visualized together. My method performs a similar standardization of the data from 27 dimensions or 27 different test-types into a single table (which can be visualized in a graph as well) that compares each of the texts against each of the other texts on their degree of similarity on the 27 tests put together. In contrast most previously attempted computational-linguistic methods just test for word frequency - one of my tests; they make this simplicity seem complicating by calling each word they test for an n-gram or a dimension of the process.

There are 7.8 million words in the corpus of 284 texts that I measured from the Renaissance. I tested all of these 7.8 million words, so I could say that there were 7.8 million dimensions or features, but also could multiply this number by 26 as I performed 27 different tests and only one of them was for n-grams (words), and one for n-grams (letters) etc. Why would words frighten me, are you forgetting that 12 volumes of my series are translations of previously untranslated Early Modern English into Modern English with extensive annotations that explain linguistic choices etc. made in these texts?

165faktorovich
Dic 7, 2021, 2:33 pm

>140 Petroglyph: You are repeating yourself without reading my responses. I stated previously that there is no logical reason for anybody to use a computer language to create new software to test linguistics when these tools for counting punctuation etc. have already been built. When I have reviewed R and other tools I have concluded that they only do simple tests for words and letters and the language describing these steps just make them sound like thousands of comparisons by counting each word in the texts, or each unique word in the text or the like. Can you list a hundred or even a thousand of these tests this tool can perform. Why are you referring to these tests abstractly. What thousand tests can possibly be performed on a corpus; there aren't enough different linguistic categories that are relevant to hit even a hundred tests; there are only so many different punctuation marks, word-types etc.

I also explained before that I refrain from testing post-1926 works because they have not been digitized yet and are not available in full-text in the public domain; this is why I can't test your proposed M. initials mystery. And as I said in the rest of my reply that I ignored, it is possible there is indeed a mystery behind the M. initial, as another author could have used this distinction to profit off a popular name, so the name similarity alone is not enough to establish certainty in the authenticity of any given set of bylines.

The accuracy of any computational-linguistic method increases as the word-size of a given sample increases. Any samples under 1,000 words are like to produce too many glitches. The size of Ulysses is also enormous in terms of word-count, whereas the other Joyce texts are small, so this corpus would be skewed towards having overwhelmingly more Ulysses and this is likely to create additional glitches in terms of similarity.

My method does not choose which texts are included in the corpus. The researcher makes this selection. You are saying that if a researcher deliberately searches for contemporaries in the same circles with the tested author X; the researcher will find a ghostwriter in this mix. This assumes that most popular/research-worthy tests are ghostwritten. If there was no prevalence for ghostwriting; then, testing authors in a shared circle would not show several bylines sharing a linguistic signature.

I am willing to test Ulysses against all digitized Joyce texts in full and in pieces and against around 30 other texts from Joyce's circle to determine attribution patterns; if you are going to agree to perform tests of your own on the same corpus of whole and pieces texts and we can post the results on GitHub, and link these results to this discussion. Just email me so we can confirm the texts we are using and the chapter-breaks etc. we will be applying.

No, my method does not consider "shocking low similarity" to be proof of similarity. Just the opposite is true. I have no idea why you are saying nonsense and claiming I said it. My method does create a linguistic "profile" when it groups texts with a shared linguistic signature together and establishes the elements that the texts in that group tend to share. Your approach is flawed because you believe all of the bylines, even if texts within these bylines do not actually match each other or register as similar when quantitative comparisons are applied to them. If you believe all bylines; you have errors that are engrained in your method that lead to a re-affirmation of established bylines instead of to the true bylines. Your method makes biased assumptions, whereas mine is entirely logical and unbiased as it ignores all bylines until after the math establishes linguistic similarity and divergence.

166faktorovich
Dic 7, 2021, 2:46 pm

>141 melannen: Advanced statistics was a required course in my undergraduate economics degree. I received an A. I have a filmographic memory, and I am sure we did not learn anything about any programming languages in that course. We did all statistics by-hand or with a calculator. I have no idea what statistic courses would cover programming of any sort. Statistical programs are written by programmers, so that statisticians can just use these programs without each of them inventing a program of their own to perform the same basic statistical steps. Programming is irrelevant to measuring linguistic features because the tools for these measures have already been programmed and are available for free. I hope you will stop repeating the same nonsensical criticism.

167faktorovich
Dic 7, 2021, 3:07 pm

>143 Petroglyph: The opposite of your statement regarding the significance of the top-6-most-frequent words is true. In contrast, it is your approach of counting the "many hundreds of... words/phrases" that is "useless". The data for the top-500 most-common words becomes impossible to fully evaluate in an unbiased way because most of the same words should appear in all of the texts, while there will also be as much as half or more words that only appear once in one of the texts. Methods that use this approach never explain what logic is used to establish similarity when this many words are compared. Are all single-word instances discarded as irrelevant? Are the top most-common words discarded as well because they are too common: this is a common element mentioned in research of this type, as researchers are proud to have taken the most common words out of this comparison. In contrast, my 6-words method derives a couple dozen common top-6 words (and letters) patterns. For example, both "Charlotte" and "Emily Bronte" between their three tested novels share the a-pattern in words, which includes these six most-frequent words: the, I, and, to, a, of. These patterns reveal the personality-based preferences of the author that do not change significantly even in different genres.

Your criticism of my 3-word-phrase analysis is again nonsensical. The top-6 most-frequent 3-word phrases are obviously 1-6th in frequency, not 7th or 9th. I ignore all other 3-word phrases for this verifying test so that my analysis is standardized and I am not occasionally adding bias to my method by considering as you said 7th or 9th phrases in similarity to the most common ones. The size of the text determines the relevant absolute frequencies, so it would be irrational to compare these absolute frequencies in a corpus of 284 texts with texts that vary between 500 words and over 300,000 words.

168faktorovich
Dic 7, 2021, 3:08 pm

>144 Aquila: The Brontes' novels were published by the same publisher, and thus the same typesetter/editor worked on them, so their editorial style/changes would have standardized all texts to include or exclude contractions, so none of these texts should stand out from the others if this was something the editor had standardized.

169faktorovich
Dic 7, 2021, 3:13 pm

>147 MarthaJeanne: It is extremely difficult for an editor to adjust the percentages of punctuation and word choice enough to change the signature to match the editor instead of the author. But I have seen it occur in the "Shakespeare" corpus where modernizing/translating editors have changed the signature from "Shakespeare's" original style to their own. If this rate of editing was needed for the "Brontes", this would imply extreme rates of misspelling and grammatical errors similar to a change of the language from Early Modern English into Modern English. If we assume the author or the ghostwriter did most of their own work without extreme intervention from an editor, editorial input does not alter the attribution results of my 27-tests.

170faktorovich
Dic 7, 2021, 3:19 pm

>148 Maddz: There is absolutely no relationship between who transcribes or copies a manuscript and the linguistics the original author used in the text that is being automatically copied. There is an enormous quantity of spelling errors in the final published texts from the Renaissance, so if there were any clerks, they were not editing these texts for spelling errors at least not on a scale you imagine. I found one curious text where the transcribers/ typesetter(s) appear to have inserted spelling mistakes deliberately in a set of six or so different printings of a single book in single-copy runs. You really should read my series to understand why your argument is false. The 6 ghostwriters were the writers, and not typesetters who edited the texts they were typesetting so much they obtained their own authorial style.

171faktorovich
Dic 7, 2021, 3:31 pm

>149 spiphany: Most of the texts I tested from the Renaissance were taken from the Early English Books Online database. I switched to preferring this platform after I realized that modernized-spelling versions did not match the original-spelling versions because modernizing editors had altered the linguistic style. Based on my tests, when most of the texts in the corpus are using old-spelling, the glitches of alternative-spelling words are in similar proportions in all texts, and thus this becomes an insignificant abnormality that does not prevent the recognizable modern-spelling words from being classified by the counting software. While the Renaissance exclamation/question rates were different from modern authors; they show patterns among themselves that are easily recognizable as these authors unique authorial styles.

And it is unclear if all manuscripts were first handwritten, as some plays might have been written directly during typesetting; if all plays had been first handwritten, there should have been more manuscripts of the original handwritten versions that should have survived from this period.

Yes, I am aware of scholarly editions that are made to explain spelling variations. As you appear to have failed to notice, my Re-Attribution series includes 12 volumes (so far) with these scholarly translations with annotations that explain variation in spelling, invented words, typos, differences between editions and the like.

The full bibliography of the texts I used and their sources is available on GitHub as well as in Volumes 1-2 of the series. I describe the process I used for cleaning up the texts to delete all content by modern editors, and glitches created by digitization etc. in the methodology chapter of Volumes 1-2. I used a systematic method that allowed the texts to be compared as close to their original-form as possible.

172faktorovich
Dic 7, 2021, 3:32 pm

>150 Petroglyph: Thanks for pointing this out.

173faktorovich
Dic 7, 2021, 3:35 pm

>151 spiphany: You are referring to a test on Twain, Dickens etc. I ran in around an hour based on a request made by somebody in this chat. These Twain/Dickens texts do not have the rate of spelling variance that are present in Renaissance texts, so they have not been edited much since their original publications in the 19th century. It is thus perfectly logical to use their Project Gutenberg versions, whereas in the Renaissance corpus it is better to use EEBO. The editions I used are listed in the bibliography included in GitHub.

174faktorovich
Dic 7, 2021, 3:54 pm

>154 Petroglyph: The explanation from Segarra is illogical. They point out that they have found varied meanings for the same words such as "of" and "off", but they suggest they standardize all spelling to one of these variants in all cases; this would introduce many more errors than it would fix in terms of the processing of word-types etc. by the software. I have found it is best to leave the original spelling alone, only editing glitches created by digitizing software etc. It also makes no sense that authors had no input in the spelling of the words in their texts, as this suggests authors were dictating these books orally to a printer who was choosing the spelling.

175faktorovich
Dic 7, 2021, 4:08 pm

>155 AnnieMod: As I explained before, most of the texts with the bylines of the 6 ghostwriters matched each other in the Renaissance corpus. For example, all of Ben Jonson's three tested plays (Volpone, Sejanus and Every Man) matched each other as well as the rest of Jonson's linguistic group, or the texts Jonson ghostwrote under other names or anonymously. You can see this in the 284-texts "Publishers" table in GitHub. And all of Percy's self-attributed texts, the two plays ("Cuck-Queans" and "Fairy Pastoral") as well as the one sonnet collection he published all matched each other, as well as Percy's ghostwritten etc. projects. The four texts tested from Josuah Sylvester all matched each other linguistically, and a couple of them were collections that included several texts vs. just one text attributed to Sylvester. And both of the two rare texts Verstegan self-attributed to himself texts that were tested also matched each other as well as the rest of his group. As did 2 out of 3 Harvey texts, with the one exception being a letter exchange that Harvey had with Verstegan that primarily matched Verstegan's signature without Verstegan's name appearing in a byline because Verstegan had been exiled from England. Only 1 poetry collection of Byrd's was available for testing, so it could not be compared with other texts of Byrd's. Similar tests on most other bylines proved that multiple ghostwriters wrote the texts with a single shared byline. This rare in-byline consistence for the ghostwriters contributed a small part in proving the case for them being the underlying ghostwriters.

176susanbooks
Modificato: Dic 7, 2021, 4:25 pm

>168 faktorovich: "The Brontes' novels were published by the same publisher"

In fact, they were not. Emily's novel & Anne's first novel were published by Chapman & Co. Charlotte's novels were published by Smith & Elder. Later, Smith, w/Charlotte's encouragement, gathered all of the Bronte novels & published them but the originals were not at all pub'd by the same house. Any brief biography will tell you that.

It will also tell you that Charlotte & Anne went to London to present themselves to George Smith to prove they were indeed the writers of the novels he was publishing. He was satisfied. Manuscripts, letters, diaries all confirm the Bronte authorship. Further, research might have shown you that Branwelll Bronte was assumed by many to be the author of his sisters' novels bc no women could have written such racy books. No one believes this now. Except you, I guess -- but you aren't even proposing Branwell but A B Nichols, who was considered a bit of a dimwit (even by Charlotte herself, though she eventually married him). If he was a co-author of the texts, you'd think he'd have cared about the Brontes' literary estate, rather than stuffing everything in an attic to mold over. There are so many historical & biographical reasons you're wrong, but keep pushing those numbers -- at least it amuses you, if no one else

177melannen
Modificato: Dic 7, 2021, 4:34 pm

>166 faktorovich: I'm sorry your one undergraduate statistics course didn't go into the kind of tools that are used by people working with statistics in real life! Statistics teaching in the US is really bad - it's a problem because it leads to people misunderstanding what statistics can do and not realizing when they're misusing them. My sister just finished her PhD and basically had to teach herself how to use R for her research because it was standard in her field but she hadn't had any stats classes since undergrad either. But since she'd taken the trouble to learn about what other people researching the same thing were doing, she knew that R was something she needed to know about.

(R is a program as well as a programming language - you don't have to write programs from scratch to use it! And packages like Stylo exist precisely so statisticians don't have to reinvent the same methods from scratch over and over.)

178susanbooks
Modificato: Dic 7, 2021, 4:23 pm

>177 melannen: But she got an A in an undergrad class!

179AnnieMod
Modificato: Dic 7, 2021, 4:56 pm

>175 faktorovich: The GitHub contains the list of the texts you used for this specific research on the alleged mis-attributions where you applied the tests/algorithm. What I am asking about is a few sets of texts which were used when the algorithm/tests were developed and tested. Surely they were tested on a much larger body of works with clearly defined control sets (both negative and positive) which prove that the algorithm does what it claims to do before applying it on texts whose attribution is alleged to be wrong and using it to determine authorship.

Or were the 27 tests built to look for specific things in the Renaissance writers' works based on what the developer of the tests expected to see and fine-tuned for these specific works? If so - how was decided which of the texts are correctly attributed and which are not so the control tests can be ran on them?

If the tests are not fine-tuned for the period, what happens if you grab 100-200 Victorian novels and run the process on them - does it differentiate the authors properly? How about 100-200 novels from the early 20th century? Or any other period which has enough available tests online. I cannot imagine a published study could have missed to do at least that much of a due diligence so that should be a very quick answer?

180bnielsen
Modificato: Dic 7, 2021, 4:53 pm

>178 susanbooks: In advanced statistics, even!

181abbottthomas
Dic 7, 2021, 5:03 pm

Shouldn't feeding time be over by now?

182faktorovich
Dic 7, 2021, 5:25 pm

>160 MarthaJeanne: An author only writing a single book in a lifetime is equivalent to a whale only having one meal in a lifetime. To spend a decade writing only one book even while holding a full-time job means writing on average 27 words-per-day for a 100,000-word average-sized book. The previous sentence has 25 words in it. So, you're saying an author sits down, writes a sentence, and then does no writing for the rest of the day? Is this really a believable concept for any human that has a passion for writing?

183faktorovich
Dic 7, 2021, 5:27 pm

>161 Petroglyph: This means you have an erroneous testing method that failed to distinguish similarities because it based its attribution on the bylines, instead of allowing the linguistic data determine the attributions.

184faktorovich
Dic 7, 2021, 5:38 pm

>176 susanbooks: Again, I have written 2 pages about the Brontes, and you are focusing on it as if it is the center of my research. "Emily" is the byline that disappears in my analysis that merges it with "Charlotte's" style, whereas "Anne's" style is divergent from both; so if "Emily" and "Anne" were published by the same publisher, this means the objection that different publishers could have introduced elements that caused the divergence in their styles is incorrect. This was the point of my statement; I did not list the irrelevant for this point full publishing history of these texts. Your argument about Nichols being a "dimwit" is extremely biased and irrational. What proof to you have for his stupidity, or for his isolation in an attic, and if he was isolated in an attic, don't you see a similarity there between this biographical point and the mad woman in the attic in "Charlotte Bronte's" Jane Eyre? I already explained why the "Bronte" documents being sold by Sotheby's at this time were never available for handwriting analysis previously (instead being hidden in closed archives), and that they were likely to have been forged by the forger in whose hands they were first found.

185faktorovich
Dic 7, 2021, 5:43 pm

>177 melannen: Now you are insulting my statistics professor? It was the most difficult statistics class available at UMass (R1 university); the teacher wrote the leading statistics textbook at the time; and I had to take calculus before I qualified to take that class, so it was not basic-statistics by any stretch, but rather equivalent to what I would have learned in graduate statistics classes.

If the R program was indeed easy to apply to perform linguistic tests on a corpus, you would be able to write a paragraph summarizing the steps we can all take to use this tool similar to the basic steps I provided for my method. You have not attempted to do so because these steps are deliberately withheld from the public to force literature scholars interested in attribution to hire computational-linguists to do it for them without having access to the steps to check if these computational-linguists made any errors in their analysis.

186faktorovich
Dic 7, 2021, 5:44 pm

>178 susanbooks: I also had a 4.0/A GPA in my graduate PhD program. Are you equally excited about that achievement as well?

187cpg
Dic 7, 2021, 5:48 pm

>176 susanbooks: "Emily's novel & Anne's first novel were published by Chapman & Co."

Wikipedia seems to say that they were published by Thomas Cautley Newby.

188faktorovich
Dic 7, 2021, 5:51 pm

>179 AnnieMod: My 284-text corpus is the largest number of texts ever tested in any computational-linguistic study of the Renaissance to-date. And you are asking if I tested my method on other texts to check the validity? Sure, it so happens that I did previously tested the method on the 18th century British texts (this study is forthcoming after I finish the second half or another 14 volumes or so in the Re-Attribution series), as I mentioned earlier, and then on 19-20th century texts (this data is available on the GitHub page). You have failed to read my earlier replies before making these criticisms.

While the 27 tests were chosen that better fit the Renaissance vs. 18-century texts, these tests could not be manipulated with any type of bias as they were calculated with basic counting software. For the 18th century, measures such as words-per-sentence could be used because there were few plays in this corpus etc., whereas this measure was not a good fit for the frequent broken play-lines in the Renaissance. The corpus was expanded gradually, and at each expansion point, all newly added texts were compared to the earliest added texts among the rest; this was a constant re-testing that repeatedly confirmed earlier findings.

189faktorovich
Dic 7, 2021, 5:56 pm

>187 cpg: Thanks for the correction. It is irrelevant who published them for the specific attribution test that I ran. My goal was not solving who the ghostwriters were between these "Bronte" texts, but rather simply to point out that the masculine bylines used in the original texts were more likely to be the accurate gender-assignment. To establish if their publisher/editor/relatives etc. were the underlying ghostwriters, all of these potential authors have to be tested linguistically and biographically. Just checking the name of the original publisher(s) only solves the issue I addressed that the linguistic signatures do not show a publisher-specific pattern based on the evidence you guys are presenting.

190prosfilaes
Dic 7, 2021, 6:21 pm

>182 faktorovich: And yet it happens. I've got a LibraryThing page for two pages in a book. People write one book and lack the impetus to write another. https://tellersofweirdtales.blogspot.com/2018/11/f-georgia-stroup-1882-1952.html tells of a woman who wrote one story (a great one, by my tastes) and one nonfiction article, and apparently nothing else that was ever published.

And this is why I found your claim about mathematical purity obliterating bias so frustrating. If you believe that every author is a full time author, then it's not surprising your results show that a handful of authors wrote a vast array of works.

>184 faktorovich: This, like several other posts in this thread, show a carelessness in reading other's texts. Susanbooks did not say anything about Nichols being isolated in an attic.

191lilithcat
Dic 7, 2021, 6:24 pm

>102 faktorovich:

it is illogical to claim that any writer could have made a living as a "professional" writer by only publishing as few as a single book or staging a single play in a lifetime.

Harper Lee

192faktorovich
Dic 7, 2021, 7:35 pm

>190 prosfilaes: I am not disputing that no author has ever only written one book or one article in a lifetime. I am defending against your false assumption that the large number of texts I am re-attributing to the Renaissance ghostwriters alone is sufficient to discredit my conclusion. While across the history of writing, some authors might have been unprofessional or non-full-time writers; as I mentioned earlier in this exchange, independent authorship was illegal in the Renaissance according to vagabond laws without an aristocratic patron. So the past claims that hundreds of Renaissance lower-class people managed to publish books on contract for payments without any affiliated aristocrats is the obviously false claim; and not my interpretation of this period as a time when six ghostwriters fraudulently bypassed the vagabond law by using multiple pseudonyms to write for a living.

193faktorovich
Dic 7, 2021, 7:37 pm

>191 lilithcat: It is possible that if I tested your Harper Lee counter; I would find that the style in Mockingbird matches that in other bylines, or Lee might have indeed only had one book in her. Thus, naming an author known for only one book cannot possibly prove or disprove what the odds are that anybody who only wrote a single book is likely to have used a ghostwriter.

194Keeline
Dic 7, 2021, 8:12 pm

I recall well the enormous amount of work required to take the Project Gutenberg files for the first five Tom Swift books to prepare them for a special 100th anniversary edition in 2010. My wife and I spent several hours for each by making a close comparison with the published copies from our collection. There were an enormous number of changes in punctuation and other details that needed correction.

This is only a single example of why texts from the public domain era may reflect many hands introducing changes in spelling, punctuation, and even altering the words themselves. On many classic works of literature there are variorum editions that document the differences between early texts for a given work.

For this reason, I am a bit skeptical of counts of things like exclamation marks in a single copy of a text from an online source.

In this long thread I have seen references to 27 tests but no listed summary of those tests.

In my studies of Edward Stratemeyer, himself an employer of ghostwriters through his Stratemeyer Syndicate, with one of his principal publishers, Warren F. Gregory of Lothrop, Lee & Shepard of Boston. He complained about the editorial and typesetting changes made to his typed manuscripts. The publisher defended some of these.

Further I have seen publisher manuals from the early 20th Century which gave details on the style rules applied by that publisher when they would typeset a book. Some of this same information was formalized in "manuals of style," some of which are still followed today.

It is perhaps not thought of by all readers but should be recalled that there are many hands and minds between the "author" and the printed page seen by the reader. These can influence the final product but also introduce noise that makes it harder to get to the initial author and their pure output prior to self-, typesetter- or publisher-editing.

James

195Petroglyph
Dic 7, 2021, 8:19 pm

>194 Keeline:
From the interview: "punctuation, lexical density, parts of speech, passive voice, characters and syllables per word, psychological word-choice, and patterns of the top-6 words and letters"

Plug any long-ish text into one of the three online services she lists in step 2 in that interview. The tests originate from there.

196Petroglyph
Dic 7, 2021, 8:25 pm

>165 faktorovich: "Can you list a hundred or even a thousand of these tests this tool can perform. Why are you referring to these tests abstractly. What thousand tests can possibly be performed on a corpus; there aren't enough different linguistic categories that are relevant to hit even a hundred tests; there are only so many different punctuation marks, word-types etc."

Challenge accepted!

Leaving aside like four punctuation marks ("!", "?", ";", "("), and the POS (let's say 9: Noun, Verb, Adj, Adv, AuxV, Prep, Pron, Conj, Interj), and a few thousand word types. Leaving aside avg word length, avg sentence length, function words, content words, unique words, character n-grams and word n-grams and string n-grams (for multiple values of n), passive voice, emotional content, etc. Here goes:

potentially diagnostic word endings (-ed, -ly, -n't, 'll, -ing, -ion, -age, -dom, 's, ...)
potentially diagnostic abbreviations: etc, eg, ie, viz, ...
Sentences starting with "The", sentences starting with "It", senteces starting with "There", ... (This is not the same as testing the distribution of these words across two texts/authors/genres. Specifically: is there a difference in the distribution of how these authors/texts/genres start their sentences?)
Sentences ending with: prepositions in general; pronouns in general; on, of, to, by, ...; me, you, him, her, it, ...
Sentences starting with conjunctions in general; sentences starting with "and", "if" (as opposed to authors who prefer the if-clause at the end of the sentence), "however", "though", "thus", ...
Proportion of Nouns/Verbs, proportion of Ajd/Noun, proportion of AuxV/V, ...
Combinations of Adj+Adj+Adj+Noun; Adj+Adj+N; Adj+N; AuxV+AuxV+Verb (e.g. "might have done; could have done; ...),
Proportion of Nouns per sentence; Verbs per sentence; Adj per sentence; ...
proportion of ",/."; proportion of ";/.", proportion of "!/.", proportion of "?/."
Numbers: units, tens, hundreds, thousands, ...; fractions; dates; currency; ...
clauses starting with "which", or with "what", or even with "that" (i.e. introducing relative clauses)

I'm sure there are more. Sentences starting/ending with common two-word combinations; sentences starting with common three-word combinations. Diagnostic phrases (One of the reasons they caught the Unabomber was because he said "you can't eat your cake and have it, too" instead of the more usual "have your cake and eat it"). Dialectical features (which can be morphological, lexical, syntactical).

There you go. Hundreds. Not just "counting words and punctuation". Several of these count specific targets in specific contexts (sentence-initial words; sentence-initial phrases; word-final morphemes); several count proportions of various POS and the proportion of, say, questions per declaratives. Just like your test for psychological word choice, counting Numbers is not just "counting words", it is counting a specific subset of words that are more typical of some text types than others. Just like your test for Passive Voice isn't just "counting verbs" (it counts a subset of AuxV+Verb patterns with a specific usage pattern and stylistic preferences), a measure of sentences with relativizers counts a set of clause types that are associated with the complexity of the text and with the depth in which sentences are embedded in each other (more relativizers inside relativizers generally means more complex texts).

The "thousands" would include all the (tens of) thousands word types (all the occurrences of I, you, me, is, are, work, works, worked, working, ...) in the text and check for unique/diagnostic expressions. Not top-six or whatever. Unique words. Again, which ones those are you cannot know a priori. They have to emerge from the data.

Some combination of these factors with uniquely identify a Jane Austen text from a Emily Brontë text, such that if you apply that signature to another of Austen's works it'll correctly identify it as by her with a confidence of over 95%. You could do the same for my comments in this thread and yours. If one of us posts a new comment, a good author signature will correctly identify the new post as belonging to either of us, and if your selection of features is fine-grained enough, you can get shockingly high levels of confidence (95% is kinda low). Some combination of these might distinguish Iain Banks from Iain M Banks (or perhaps the two are indistinguishable -- there is no way of knowing until we try!). Some combination would probably distinguish the individual chapters of Ulysses. Some combination would distinguish your fiction from your non-fiction.

Of course, the vast majority of these will be irrelevant, or rather: their relevance (if any) will be so tiny that they're negligible. But a couple won't! And -- this is the important bit -- you can't determine which ones a priori -- this is something you must determine empirically, by trying them out! By finding exactly which ones work for each author!

Your method uses 27 tests. That is all. They are 27 common ones, to be sure, and they'll get you part of the way. But they will not suffice to correctly and reliably separate hundreds of authors across multiple centuries and multiple genres and multiple social classes and multiple geographic areas and multiple ages and multiple styles and multiple levels of education etc. etc. They are not nearly fine-grained enough, by at least an order of magnitude. You need more fine-grained analyses to even begin tackling a task of this magnitude. And yet, you are limited to the ones that the free online services you use spit out. Your methodology is unable to run tests that the online generators doesn't hand it ready-made. You are unable to even check if "sentences beginning with "there" are a diagnostic difference between two authors, or if a certain proportion of "There" sentences coupled with a certain range of Passive Voice avoidance with a certain proportion of function/content words. All your method does is assess some similarity between texts based on only these 27.

Not even on a subset of these, either. No: always these 27 taken together. Whether they are relevant to those particular authors/genres/periods/... or not. Would your method even be able to reveal an author's unique/diagnostic words? Does any of the online services you use do that? Would you be able to if the online service doesn't? Can you run your own unique-word test on a particular text and see if it's different from the generator?

The truth is: You did not select these 27 tests because they were diagnostic, or relevant, or appropriate for 17thC texts, or appropriate for 1800s novels. You chose them because the online tools give them to you. And that is not enough to do the kind of far-reaching research and the kind of heavy lifting that you want them to do.

The "similarities" you find between texts are so wide and so non-fine-grained that they hide diagnostic differences for all of the hundreds of tests that you are unable to test for, and that you are, apparently, unaware of (why else ask me to list some?). The fact that you only use 27 tests for 284 texts between 1560 and 1650 is directly responsible for your method returning a limited number of individual-signature authors.

Incidentally, stylometry software allows you to implement any of the above tests, just like that. When you say in >129 faktorovich: that "The R programming language is not relevant to my analysis because the basic tools for counting the number of punctuation marks etc. have already been programmed and are available for the public to use" this merely reveals that you have no idea what you are talking about. In your interview, you say that "My combination of 27 different tests is thousands of times more accurate than the standard method in this field, which only tests the frequency of common words." Lady, you do not know what you are talking about.

197Petroglyph
Dic 7, 2021, 8:30 pm

Next up: genres.

The similarities that your method reveals between texts can be due to those texts belonging to a certain genre. But someone using your method has no way of knowing.

Some combination of these hundreds of features be able to differentiate between, say, comedies and tragedies (a bundle of features that are more "chatty" and informal, for instance); or fiction and non-fiction (the latter will have a relative lack of contractions, and more passive voice, and a greater diversity of conjunctions, and a greater lexical diversity, and a number of others). Non-fiction texts will also contain more numbers, and a greater diversity of numbers (units, tens, hundreds, fractions, years, currency, ...) and more function words.

Your method tests for some of these (but not all), and it, apparently, cannot tease bundles of features apart. There is no way of uniquely and reliably identifying a text as non-fiction with your method. A similarity on, say, function words or passive voice (each of which is worth a 1 in your method) is likely to be cancelled out by, say, another feature with a "0". There is no convenient way of making certain features weigh more or less than others.

You've applied the exact same 27 tests to comedies, tragedies, historical plays, letters (I believe), and novels. There is no way that the results for all of these can be correct to equivalent levels.

Now, you may perhaps say that you don't look at non-fiction texts. But that's not the point. This is just an example of a genre that ought to be easily detectible. Your method, if it works, ought to be able to distinguish works that are similar-because-same-genre from similar-because-same-author. I'm sure that Shakespeare's tragedies share similarities with Marlowe's. But your reliance on a) only 27 tests and b) always these 27 tests prevent you from distinguishing genre similarities from author similarities.

Proper stylometry software is a couple of steps above free online passive voice counters. They can handle this sort of stuff. Your methodology can't.

198Petroglyph
Dic 7, 2021, 8:36 pm

>165 faktorovich: The size of Ulysses is also enormous in terms of word-count, whereas the other Joyce texts are small, so this corpus would be skewed towards having overwhelmingly more Ulysses and this is likely to create additional glitches in terms of similarity.

Again, you seem to not know what you are talking about. Anyone who looks at authorship attribution and who uses things like average sentence length and lexical diversity has to take into account the differences in the lengths of texts. This is very very obvious. So obvious that stylometric analyses routinely employ a standard solution: chunking. (search for "chunk" on this page, and you'll find I've mentioned this technique before).

Texts of different lengths that are compared are routinely sliced into 500-word chunks. Or 1000 words, or 10,000 words. Or whatever is appropriate. You can have each text chunked hundreds of times -- multiple stretches of text will be contained in many chunks, obviously, but one reason this is done to prevent over-fitting: fitting your averages too close to a particular set of chunks (say, near the beginning of the novel, which has different features from the ending). In this way, you can compare many texts of different lengths on things like average sentence length and lexical density, because the averages are calculated on identically-sized chunks.

This is such a bog-standard methodology in stylometry that it's not even explained anymore beyond a mention of which chunk size was chosen.

Applying chunking in stylometry software is a breeze: specify the chunk size, and the number of chunks to be chosen, and you're good. Your methodology, which involves a lot of manual copy/pasting from .txt to webpage to spreadsheet, with manual sorting and manually changing percentages to 1 and 0, would not be able to chunk easily. It'd take so much time and so many clicks and changes of environment and cut/paste. Again, your comment that "The R programming language is not relevant to my analysis because the basic tools for counting the number of punctuation marks etc. have already been programmed and are available for the public to use" makes it seem as though you are completely unaware of the kinds of things this thing can do.

Do you do visualizations? R does...

199AnnieMod
Dic 7, 2021, 8:42 pm

>188 faktorovich: Questions are not criticisms.

"these tests could not be manipulated with any type of bias as they were calculated with basic counting software"
The software does not decide what to count or what data to use from the data points that got counted, the software user does. And that's where the bias comes from.

200faktorovich
Dic 7, 2021, 8:43 pm

>194 Keeline: Your reflections on the nature of editing and publishing are irrelevant to the questions at-hand. You are discussing 20th century publishing rules, when the series that we are discussing in this thread is about the British Renaissance. The degree of editing changed enormously over the centuries. As you should know there was a wide array of spelling variations for words that was acceptable in the Renaissance, which would all be corrected as errors by modern publishers. What you consider to be an "enormous" number of changes might or might be statistically significant for a computational-linguistic analysis. Were there more than 1 change per 100 sentences; less than 1 change would not even alter the percentage of given words/ punctuation marks per 100 sentences. I have read original and edited versions of Dickens and Twain and the first editions are very similar to the latest edited editions; these are the authors I tested in the mini-test made upon request, so you have to look at their texts in Project Gutenberg if you want to find fault with their over-editing. I named the 27-tests in the interview and repeated them in this thread. They are for: punctuation, lexical density, parts of speech, passive voice, characters and syllables per word, psychological word-choice, and patterns of the top-6 words and letters. 3 of the 6 Renaissance ghostwriters were publishers themselves (Byrd, Verstegan and Jonson), so it was their job to edit/typeset these texts. You guys really just need to read my book before making these statements, as if you had read it you would understand why these points are irrelevant and false when they are applied to the British Renaissance.

201Petroglyph
Modificato: Dic 7, 2021, 8:47 pm

>129 faktorovich: Sure, I can apply the tests to myself. You should know that I have done professional ghostwriting in the past, though I cannot disclose for whom. You would have to test millions of texts with around a million bylines to figure out what I have ghostwritten, as it is only a few projects in a sea of modern publishing. It would be less extreme for me to just test the texts with my own byline, but only testing my own tests against themselves would once again fail to establish what they are different from. For every text by me, I would have to add a few texts by other bylines writing in the present (and again I try to avoid testing anything published after 1926). Testing works by any single given author produces a set of numbers that represent the range of that author's style; only when this style is compared with other styles do these tests indicate authorial attributions with my method.

>165 faktorovich: I am willing to test Ulysses against all digitized Joyce texts in full and in pieces and against around 30 other texts from Joyce's circle to determine attribution patterns

Look. We've been here before (my final comment in 140). Your methodology turns up ghost-writers almost everywhere because you are not allowing your method to do anything else. If your method is indeed applicable to, like, all of the public-domain authors from the 1560s through 1926, then it should be able to tell whether one of Joyce's works is different from the others, and just how far removed it is from them, and which of his other public-domain work is closest to it. It should be able to reveal which of your own texts cluster together, whether your fiction texts are systematically different from your nonfiction texts, and whether your interviews cluster differently from your monographs. No need to involve other authors or texts.

Instead, you only use it to compare texts with your non-fine-grained 27 tests with the specific intention of attributing similarities in text signatures to the same individual author. When asked to compare your works against each other, or Joyce's works, or Iain (M) Banks' works, you automatically make the jump to involving other authors and texts specifically to see which texts are similar-ish, specifically so that you can, in your own words, "determine attribution patterns". Detecting ghost-writers is not the only thing your methodology should be able to do. Assessing an individual author's signature across their body of work should be no objection.

Assuming the conclusion, garbage in garbage out, not testing for alternative explanations. Yadda yadda.

202Petroglyph
Modificato: Dic 7, 2021, 8:49 pm

Look. I know you worked very long and very hard on your method. And it can't be easy to hear that you've made some basic errors that invalidate your entire methodology and all the work you've built on it. If you choose to entrench in your sense of being right, I can't stop you. But I hope that I've highlighted some of the very fundamental issues with your methodology. As it stands, I can trust exactly none of the results of this methodology, because you're trying to make a basic set of unvaried tests do way, way more work than is plausible. I hope you take this on board and strive to improve your methodology. Some of the basics are sound, but some of its shortcomings are fundamental errors that cripple the whole enterprise.

You're enthusiastic. You have the drive, the time and the energy. The kind of analyses that you are trying to do cannot be reliably done with the kind of free online text analyzers that you use. You need more advanced tools. Perhaps a programming language, perhaps Statistics with Excel.

I know three things:

It is not my responsibility to peer review your work (even though I might as well have, lol).
It is not my responsibility or my job to explain again why your methods are ureliable and produce false results.
If these explanations aren't sufficient, nothing will be.

Peace. Here ends my peer review. I'm sure there will be much rejoicing.

203amanda4242
Dic 7, 2021, 9:04 pm

>196 Petroglyph: I just wanted to say how much I've enjoyed your contributions to this thread. I've always thought linguistics a fascinating field, and your posts have taught me a lot...even though some of it's over my head!

204Aquila
Dic 7, 2021, 9:07 pm

>202 Petroglyph: Thank you, Petroglyph.

205melannen
Modificato: Dic 7, 2021, 9:20 pm

>185 faktorovich: I'm not insulting your statistics professor! I'm complaining about the state of statistics teaching in the US. I too took my school's most advanced undergrad stats course, taught by one of the great experts in the field - you had to have passed linear algebra and multivariable calculus. I learned absolutely nothing that was useful for actually doing the stats work in my science and humanities classes, which is why I ended up also taking the stats for non-math-people class, which was much more useful.

We're on the same side here about the academy! I also agree that there are a lot of things that are gatekept and obfuscated in academia, which stifles new voices and helps bad ideas propagate. But you're barking up the wrong tree here - R is well-known partly because it has done a lot to make stats *more* accessible to non-specialists and people outside the Academy. The first example where R was brought up in this thread was somebody doing a stylometric analysis for fun, as a hobby! That's why I feel so strongly about this - it's really important that programs like R mean that good, rigorous stats are something anybody can easily do at home.

Here's how to run a stylometric analysis using R in less than fifteen minutes:
1. Go to Project Gutenberg. Download txt/UTF-8 files of some texts you want to analyze. Save them to a folder called "corpus". I downloaded a bunch of Bobbsey Twins, since we know they were ghostwritten and people like keeline have done the documentary research to figure out who the ghostwriters were.
2. Go here: https://www.rstudio.com/products/rstudio/download/#download and dowload and install the free desktop version of R studio. (That website also sells pay versions because it's a slicker version of the free software, but you don't need the pay version.)
3. Open your newly installed program. Go to "tools"->"install packages". Type "stylo" and press "install".
4. In the "console" tab, type "library(stylo)" and press enter.
5. Then type "stylo()" and press enter.
6. This brings up a new window (it has a quill pen icon) with a bunch of options. You can adjust the options, but the defaults are pretty good. The only one I changed was I went to "output" and selected "jpg" so that my results will be a pretty graphic.
7. Press "okay".
8. Navigate to the folder your "corpus" folder is in and press "Select folder".
9. The window will go away and you'll see some stuff flash in the "console" tab. Then, go to the folder you selected and you'll see a .jpg file. Open it. There is a tree diagram showing how similar your texts are to each other! There are also some other files that have the word tables that Stylo used to make the tree, so you can load them into a spreadsheet and do other things with them if you want.

There you go, that took me less than half an hour including installing the program, finding the corpora on Gutenberg, googling a tutorial on how to run Stylo (since I'd never used it before and haven't used R in years) and then writing these instructions.

That isn't going to get you rigorous results, of course - for that, you have to do the work to understand the statistics that Stylo is running, and how to select your corpora correctly, and what settings to use for various things - but if you want an easily reproducible-by-anyone method, there you go!

(Other people in this thread: you should play with this, it's fun. I am peacing out of this discussion here too because I have now installed R and am distracted playing with Stylo instead.)

206faktorovich
Dic 7, 2021, 9:21 pm

>196 Petroglyph: You did not list a hundred and certainly not a thousand different tests. What few thousand word-types; there are 4 word-classes (individual pronouns cannot be counted as a word-type in this context as this is statistically nonsensical - and it is better to just use the most-common-words)? The biggest problem with most of the tests you did list is that the Renaissance texts include irregular punctuation (sometimes no periods at the end of sentences), and strange spellings/endings (that would skew your -ed/-n't endings, among other measures). And some of the meanings of words are different or the spelling is so different your machines would not recognize the intended word to classify it as an adverb vs. verb. Commas and semicolons can also be mistaken or mixed up randomly, so measuring their order might confuse your system. And there are no commas before "which" that are common in Modern English and you might use for this metric. If there is no period at the end of a sentence, your system would not be able to count what type of a word begins the next sentence. Thus, while I considered using some of these more complex measures, I decided the basic tests resulted in the smallest number of glitches and the most consistently accurate results. And while you can list this large list of possible tests, I have read over a hundred computational-linguistic studies and none of them described any such list, but instead tend to just say they test for word n-grams but in language that makes this sound convoluted. Can you cite a study with a list like the one you gave and with raw-data publicly available of the matches/non-matches/ quantities of all of these measures that resulted using the R program?

If you select "unique/diagnostic expressions" on a text-by-text basis you are using your biased judgement to measure only the words you cherry-pick as relevant without any stated system for what this selection is based on in all cases.

I know that a combination of linguistic tests correctly identifies the linguistic signature groups of tested texts: this is precisely what my method is. I simply use the right tests in a method that reaches consistent correct results because it is unbiased, whereas your method is designed to re-assert existing bylines even if these are incorrectly grouping texts by multiple signatures.

Your concept of "diagnostic words" is faulty because it assumes that anything other than the most-frequent words can be compared systematically in an entirely unbiased manner. I previously explained how your method's omission of the most and least frequent words and the use of unexplained special "diagnostic" words means you have given yourself a license to pick out which words are tested in a text without explaining how you have excluded the words you are not considering. Thus, your results can be entirely what you desire them to be and not what the data might have said if your goal was the truth and not hitting an over 95% accuracy rate based on currently-accepted bylines.

There are far more than 27 tests available on the platforms I linked to, and I did indeed choose these tests because they are relevant to the Renaissance corpus, as I have repeatedly explained.

There has not been any previous study that I have come across that has used 27 or more different types of tests, as I have done. Just send a link to this mystical study you are imagining that includes this list of hundreds of tests and more than my corpus of 284 texts from the Renaissance, and let's digest it together to see how it varies from my approach.

207Petroglyph
Dic 7, 2021, 9:23 pm

>203 amanda4242: >204 Aquila:
Thanks!

>205 melannen:
Very well done! I really wish I had your brevity.

208faktorovich
Dic 7, 2021, 9:28 pm

>197 Petroglyph: Yet again you have not actually opened the spreadsheets in GitHub to see the data I provide in them. If you open the Publishers spreadsheet, you will see that I specify the genre for all of the tested texts in a separate column. Some of the 6 Renaissance ghostwriters, like Verstegan, worked in several genres including non-fiction, drama, translation, novels, etc. Others specialized in poetry (Byrd), and a couple worked in both poetry and drama (Percy/Jonson). Within the dramatic genre, Percy preferred writing tragedies, and Jonson preferred comedies, but they also wrote in the other genre in a portion of their work. All this data indicates that my method identified the underlying author even if he wrote in multiple genres. The genre did not change the attribution calculation. An author who prefers a high quantity of commas, will keep adding a lot of commas in a drama as well as in non-fiction. Any small genre-related glitches were not significant enough to impact the combined 27-tests attributions.

209faktorovich
Dic 7, 2021, 9:47 pm

Questo messaggio è stato segnalato da più utenti e non è quindi più visualizzato (mostra)

>198 Petroglyph: Since you keep saying I don't know what I'm talking about, I will be direct: you are an idiot. As I explained before, text samples that are only 500-words long are very prone to misattribution due to glitches because of genre-specific, or topic-specific elements covered in this segment. These glitches disappear as a text expands, and especially when it is over 10,000 words. If you only test fragments that are under 10,000 words, you are deliberately setting up your experiment to include more glitches or likely misattribution errors. "Over-fitting": you almost seem to be are saying that your complex programs cannot fit a text with over 10,000 words? The problem you mention regarding the beginning/ending of texts being likely to have strange/unique features is the reason, I test texts in their entirety, as this eliminates looking separately at these parts that might have more glitches in isolation. You don't seem to understand how "averages" work; the text can have any number of words, and the average of something in it is still going to be a simple number. Shrinking the size of the text by cutting it up does nothing but degrade the accuracy of the test because, as I explained several times, the smaller text-sizes create glitches on the average-tests. In the articles I have seen that use chunking, they tend to select oddly different numbers of words per-chunk, instead of consistently using 500 or 10,000 etc. per-chunk; this selective choosing makes it difficult to re-test a given text, and it means that some chunks are twice smaller than others, and thus have more glitches. I include several tables and diagrams across the 14 volumes of the series, and many of the diagrams are included in the GitHub files; it still amazes me how self-centered you are that you still are not reading anything I have written in my studies and are just repeating the standard double-speak common in these computational-linguistic articles that have led to the current misattributions of the British Renaissance.

210faktorovich
Dic 7, 2021, 9:50 pm

>199 AnnieMod: The software counts the same type of measurements in my tests; the user does not change the tests for different texts; thus, my method does not have bias, unlike rival approaches.

211faktorovich
Dic 7, 2021, 10:02 pm

>201 Petroglyph: If you tried teaching college English for a few years, as I have, you would notice that plagiarism is extremely prevalent. I have taught in Georgia, Pennsylvania, Arizona, Ohio, China, Texas - just all over the map, and TurnItIn has detected up to 98% of papers being plagiarized in my students' work. The College Admissions Scandal is not showing an exception, but the cultural norm. A more relevant case is the revelation that "Carmen Mola" was not a woman, but really three male ghostwriters who won an award recently in Spain. I run a publishing company (Anaphora Literary Press), and have been working with writers and meeting writers at conventions etc. since 2009. I know why I am seeing ghostwriters, but I do not know why you are insisting ghostwriters aren't real.

Yes, I could test only Ulysses or only Banks or only my own texts, as I explained before; but these tests would not prove anything important. Testing individual works in isolation by selecting one or a couple of "Marlowe" texts and a couple of "Shakespeare" texts is how computational-linguists have derived the current nonsensical re-attributions of the Renaissance. Small numbers of texts in a corpus will lead to either matches or non-matches, but not to who the underlying authors are, or if a ghostwriter etc. was involved. Proving that a "Marlowe" is similar to one "Shakespeare" text has been sufficient in earlier studies to re-attribute one of these texts, but this is extremely sloppy and careless. Have you tested your own, Joyce's Banks' works in isolation; if not, why are you telling me I have to run this test my method works. I have already verified my method on several corpuses, which I guess you have not realized since you have not actually read my research.

212lilithcat
Dic 7, 2021, 10:03 pm

>209 faktorovich:

I will be direct: you are an idiot

"LibraryThing prohibits personal attacks, name-calling . . ."

213faktorovich
Dic 7, 2021, 10:09 pm

>202 Petroglyph: You did not perform a peer-review of my research, but rather repeated the talking-points that commonly come up and I have heard before in peer-reviews. This is why I have already thought through the points you raise, having previously sent similar responses without receiving clarification from "reviewers" like yourself who aim to exclude competing methods as their primary objectives for serving in the reviewer capacity. Your goal is selling your services as a superior computer-programmer, and making rival-researchers feel like they have to surrender their testing methods to you to perform the tests for them. You would then return the "answers" or the attributions that re-enforce established bylines, or are entirely nonsensical and seem to show no logical attribution is possible in a corpus. I have had this exchange with a computer-linguist before. I rejected these re-attributions because they were blatantly erroneous, and the researcher refused to provide raw-data, so I could check what went wrong. As I said before several times, no programming is needed in computational-linguistic author-attribution because the tools have already been programmed. You just profit from the opposite because you make a living from re-programming.

214AnnieMod
Dic 7, 2021, 10:10 pm

>210 faktorovich: The bias comes from determining what you want counted, not from using different counters. A person made a decision what metrics need to be counted. That introduced bias into the whole method - intentionally or not.

215Petroglyph
Dic 7, 2021, 10:12 pm

>206 faktorovich:
You did not list a hundred and certainly not a thousand different tests
Yes I did. The "..." in that list is crucial and does much of the work.
Your not agreeing with some of them does not mean that they can't be tested for. And how do you know they are not relevant for your texts if you reject them out of hand? That's assuming the conclusion, lady. It's a no-no.

The biggest problem with most of the tests you did list is that the Renaissance texts include irregular punctuation"
If your corpus has widely inconsistent punctuation... then don't test for it?? You do know that it is possible to not test for certain things, right? You don't have to!

"If you select "unique/diagnostic expressions" on a text-by-text basis you are using your biased judgement to measure only the words you cherry-pick as relevant without any stated system for what this selection is based on in all cases. "

False. The exact same kind of software that counts the "most frequent word" can count the words that occur more in a certain text per 1000 words (or whatever) than in other texts, relatively speaking. Not in absolute terms, in relative terms.

"diagnostic words"
"the words you cherry-pick as relevant without any stated system for what this selection is based on in all cases"

I ain't been picking no cherries, lady. "diagnostic words" does not refer to hapax legomena, words that occur only once in a text and which you can select according to your own biases. It means "words that author X uses more often relative to other authors. Look, the word "bias" is not very common in everyday speech, but some people use more than others. If you divide the total occurrence of "bias" by the total number of people who use it, you get an average. Some people use the word "bias" a lot more than others, and some people don't use it at all. Comparing an individual's use of the word "bias" to that average gives you an idea of how much that person's use of this word stands out from the average. If they use it more than the average to a statistically significant degree, then that word can be used as a diagnostic in helping to identify that particular person from the others. Those are the diagnostic words I speak of.

Some authors have these diagnostic "words" -- favourite shibboleths, typical expressions, a fondness for semicolons, a habit of including footnotes, a preference of "&" over "and", ... The list goes on. These things become part of their individual style -- tics they have that they do more than the average. And averages are easy to count across your entire corpus of text.

"Your concept of "diagnostic words" is faulty because it assumes anything other than the most-frequent words can be compared systematically in an entirely unbiased manner."

... you mean, like Passive voice, which isn't a most-frequent word, but rather the proportion of a particular pairing of AuxV+Verb pairs divided by all Verb phrases? Of course you compare more than just absolute counts. Relative counts is where it's at!

"There are far more than 27 tests available on the platforms I linked to, and I did indeed choose these tests because they are relevant to the Renaissance corpus, as I have repeatedly explained. "

By your own admission: these tests are less applicable to your 1800s corpus. Some of your results may be caused by an improperly-balanced set of tests.

"Can you cite a study with a list like the one you gave and with raw-data publicly available of the matches/non-matches/ quantities of all of these measures that resulted using the R program?"

Ramnial, Hoshiladevi, Shireen Panchoo, and Sameerchand Pudaruth. 2016. ‘Authorship Attribution Using Stylometry and Machine Learning Techniques’. In Intelligent Systems Technologies and Applications, edited by Stefano Berretti, Sabu M. Thampi, and Praveen Ranjan Srivastava, 384:113–25. Advances in Intelligent Systems and Computing. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-23036-8_10.

If you're hit by a paywall, here's a pdf

Not sure if they used R. That doesn't matter, really. They used a set of 446 features, derived from the literature (i.e. commonly used tests in similar papers and books) and from machine learning, and compared them to a subset of 135, finding that the 446 features performed better. See sections 3 and 4.3. If you want an exhaustive list of all the features they used, there are three email addresses on the first page. Researchers love to talk about their stuff to people.

This paper should be of interest to you: it's about identifying plagiarized portions of Author X in texts handed in by Author Y.

216faktorovich
Dic 7, 2021, 10:20 pm

>205 melannen: I am going to check this method tomorrow. From looking over your directions, it is clear that the basic auto method only compares texts on word-frequencies, or on only 1 test, and the system appears to be selecting the words it chooses to test for the user. Thus the resulting data is likely to be extremely difficult to double-check to see how these words were chosen. Obviously the tables could not compare all of the words in all of the texts against each other as this would include billions or trillions or more comparisons; so the word-selection process must be biased. I doubt the system would even let me run this basic test for free, but I will write a review after trying it tomorrow. Why exactly do you imagine this method of letting a more difficult to setup program (than the simple-to-operate/open software I use) compare words (1 test) is more precise than my method of applying 27 different tests?

217Petroglyph
Modificato: Dic 7, 2021, 10:49 pm

>213 faktorovich:
"the talking-points that commonly come up and I have heard before in peer-reviews.
The fact that you ignore and/or dismiss peer-review comments does not surprise me.

This is why I have already thought through the points you raise, having previously sent similar responses without receiving clarification from "reviewers" like yourself
Peer review does not work that way, lol. You send paper to journal, journal asks people who know about the subject to read it thoroughly and make sure that it's up to snuff.

Responses are usually

"publish as is" (coveted, rare, oh shiny city on the hill)
"publish with minor revisions" ("look, scholar, here's some issues that we had. Fix those and you're golden")
"publish with major revisions" ("Some of this is a bit of alright, really. Some of it stinks. Don't publish unless massively rewritten")
"do not publish" ("It's not up to snuff. Now go away")

You don't correspond with the peer reviewers. They don't know who you are, you don't know who they are. It's a double-blind kind of thing. If the journals you submit to don't do that double-blind thing, I've got bad news for you.

The journal, if it is worth its salt, has a solid tradition of selecting appropriate and skillful peer reviewers. If the quality of the published content goes down, people will notice and flock to other journals. There's an interest in keeping standards high.

Your goal is selling your services as a superior computer-programmer, and making rival-researchers feel like they have to surrender their testing methods to you to perform the tests for them

Crikey, enough with the ad hominem!

"As I said before several times, no programming is needed in computational-linguistic author-attribution because the tools have already been programmed"

Look. The sad truth is that Science Marches On. The easy things get done first. Then the less easy things. Then the hard things. Occasionally new areas open up where new groves of low-hanging fruit become available (e.g. ancient dna sequencing. Super interesting!).

But sadly, much of the low-hanging fruit has been picked. If you want to make new discoveries, you have to put in the work and understand how all the low-hanging fruit got picked, and all the medium-hanging fruit, before you really can start reaching for high-level fruit. Once you're there, the ladder you used to reach the low-hanging fruit won't be sufficient any more. You need better tools.

Also, some of the low-hanging fruit is rotten inside. You don't know until you've picked it and discovered worms or wasp eggs inside and have an explanation for why it is rotten. Some of the high-hanging fruit is rotten, too, btw.

"You just profit from the opposite because you make a living from re-programming"

218melannen
Modificato: Dic 7, 2021, 10:59 pm

>216 faktorovich: That program lets you do anything it can do for free!

The basic options in the window that comes up let you choose which and how many tests to run, and how many words to compare against each other, and how to select those words, and a lot of other things. It has tooltips explaining what each of the options does. You can also compare things other than words. There is also an option for putting in your own manual word or character list for it to use instead of generating one.

(You *can* have it compare all the words in all the texts, but it has a built-in warning that if you want to compare more than 5000 different words, you need a supercomputer. Still, that's dozens of millions of comparisons, and my cheap laptop did it in a few seconds just fine. Computers are good at doing billions of simple computations really fast! That's why being able to use things like R is so powerful.)

Those are just the basic options available in the easy-to-use window interface in my instructions - but there are also lots of other things you can do with Stylo and R, and plenty of free online resource explaining what they are. Figuring out what tests you should run is the hard part; once you've done that, it's easy in R to show other people how to do the same thing.

I am making no judgements about the precision of your method. I'm merely pointing out that it is actually just as easy to use the standard methods of stylometry, using free software like R that has the standard methods built in, as it is to use your method. You keep saying it isn't, and that one of the reasons your method is superior is that anyone can reproduce it. But it's clear you haven't actually done any looking into how to actually use the standard methods before you do that, because they are just as easy to reproduce! I hope you do try the stylo package and find it useful for your work. If you're presenting a revolutionary new way of doing things, it's really important to be able to demonstrate that you understand the standard way of doing things well enough to criticize it accurately.

219Petroglyph
Dic 7, 2021, 11:32 pm

Oh, more software that is absolutely free and will vastly improve your corpus-handling skills without requiring coding skills: Antconc. There were times in my life when I used that application daily. Good times. Miss ya lots, anty; gros bisous.

Oh wow: there's more freeware apps available there now than when I used it. That team's been busy like, uh, bees.

faktorovich, you might find antgram useful (for generating n-grams). Looks like you might have use for antpconc too: this tool analyzes parallel corpora. Lots of screenshots, and every utility comes with its own help pdf.

ProtAnt compares a UTF-8 text you feed it to a reference corpus you also can set. It'll show you the ways that the target text deviates from the reference corpus.

There's a file converter there, too, to create UTF-8 txt files from other formats.

220spiphany
Dic 8, 2021, 3:51 am

Petroglyph, melannen, I just wanted to thank both of you for your patient, rational, and knowledgeable contributions to this discussion.

I realize this thread probably feels like a bit of a pile-on to the author, so I want to note that I do think stylometrics -- when applied to specific questions in specific contexts by people who know what they are doing and have taken all relevant factors into account -- has a lot of potential for interesting insights.

It's a pity the field hasn't necessarily been presented in its best light here, particularly for people who didn't have much or any previous familiarity with it. (Pennebaker's The Secret Life of Pronouns was my introduction to the topic; I thought he brought up a lot of intriguing points but occasionally reached too far in making sweeping claims about the explanatory power of the relative frequency of function words. But without this previous exposure I'm not sure I would have had any idea based on the blog post or this thread what the author was doing or why counting words and punctuation marks could tell us anything useful...)

221andyl
Dic 8, 2021, 6:19 am

>188 faktorovich:

And this is where it is obvious (if it hasn't been already) that you do not understand what AnnieMod and others are saying.

If we pick a set of books where the authorship isn't in question - does you methodology find differences or not. This is the same test people talked about with Iain Banks/Iain M. Banks or James Joyce. Arthur Conan-Doyle would also be a good test case.

If we pick a set of books where the ghosts are known (Stratemeyer) does you methodology identify them all as distinct individuals (and this should be inside a single series and across series)?

The idea is that you prove your methods work against a data-set where the results are known and undisputable, before you try them on a 'potential' unknown.

222timspalding
Dic 8, 2021, 9:17 am

Members are reminded that the Terms of Service ( https://www.librarything.com/privacy ) prohibit personal attacks. Comment on the content, the argument, or the wording, not the person. Thank you for your understanding and for following this rule more closely in the future.

223susanbooks
Modificato: Dic 8, 2021, 10:39 am

>184 faktorovich: "Your argument about Nichols being a "dimwit" is extremely biased and irrational. What proof to you have for his stupidity, or for his isolation in an attic, and if he was isolated in an attic, don't you see a similarity there between this biographical point and the mad woman in the attic in "Charlotte Bronte's" Jane Eyre?"

See, here's why some people aren't buying what you're selling. Every biographer since Gaskell (who knew him) said A B Nichols was a dimwit. Charlotte thought so, too, until she married him. Further, I never said Nichols was in the attic. I said, and again, you'd know this if you studied Charlotte only casually, Nichols put all of Charlotte's literary remains (including the pillar portrait) in an attic where they grew moldy & some disintegrated. (The fact that you'd take Nichols' storage of Charlotte's things in an attic after her death as proof of his connection to Jane Eyre is a sorry exhibition of your thinking)

As for Emily & Anne's writing being similar: they had been writing partners since their early childhood (again, any biography, however brief, will tell you this). We have manuscripts of their work, letters, and diaries, including from people outside the family (Ellen Nussey, for instance) that confirm this. Unless those master forgers wasted time faking juvenilia, much of which wasn't found until after all of their deaths.

If you'd ever actually read some of the authors you're talking about, you'd know that Anne is an ironic realist -- her tone is unmistakable and utterly unlike her sisters' who were more heavily influenced by the Romantics.

You don't know enough about your subjects to be making these grand statements. What is your PhD in? Certainly not the Humanities.

224susanbooks
Dic 8, 2021, 9:55 am

>187 cpg: Sorry, you're right, it was Newby. My bad.

225susanbooks
Dic 8, 2021, 9:57 am

>168 faktorovich: They were not, until later editions, published by the same publisher.

226melannen
Modificato: Dic 8, 2021, 9:59 am

>221 andyl:

Actually on my Bobbsey Twins (+ a few Rover Boys and Uncle Wiggily) tests last night, it was consistently sorting the first book (which is supposed to be by Stratemeyer himself) in the middle of the books that are supposed to be by Howard R. Garis, while pulling out the ones by Lillian Garis accurately. Which is suggestive!

But given that I don't know what I'm doing and the set of texts was selected based on when I got sick of downloading them, not much more than interesting - I would need to actually learn what the different Stylo tests are good for and do some real tests with other texts, not just play around trying everything, before I was confident even starting to criticize the rigorous documentary research. (Playing around is the first step in science, not the final step.)

227susanbooks
Dic 8, 2021, 10:00 am

>182 faktorovich: "So, you're saying an author sits down, writes a sentence, and then does no writing for the rest of the day? Is this really a believable concept for any human that has a passion for writing?"

Woolf covered this in A Room of One's Own and Tillie Olsen updated that for the later 20th century in Silences. Women's Studies courses cover this, too. Again, your lack of knowledge about literary history makes your arguments absurd.

228susanbooks
Dic 8, 2021, 10:06 am

>203 amanda4242: ">196 Petroglyph: Petroglyph: I just wanted to say how much I've enjoyed your contributions to this thread."

Agreed, and the same to everyone who has been posting rationally.

229anglemark
Dic 8, 2021, 10:22 am

I think only three people have been posting to this thread and I have the statistics to prove it.

230melannen
Dic 8, 2021, 10:42 am

>229 anglemark: I always operate under the assumption in any internet thread that it's just one person talking to themself. Especially if I'm on the thread.

231Petroglyph
Dic 8, 2021, 10:49 am

>209 faktorovich:
No, you misunderstand.

Of course a single 500-word chunk is going to be unreliable. This is known. But where did I say that a single 500-word chunk is all that is taken?

But a 50,000 word novel can be divided into at least 100 500-word chunks without overlapping. Or 50 1000-word chunks. If you analyze all of these individually and then take averages across them (for instance), the quirks of individual chunks will disappear, and the repeated author-signature features will be amplified. You've gone from one block of usable text from which to take averages and frequency counts to 50!

There's methods that involve a movable chunk: rolling the window through the text. Say you take a chunk / window of 3000 words and analyze the frequencies and patterns etc for that stretch. Then you move your window 500 words ahead and start the next chunk at the 501st word. Redo the tests for that chunk. You continue rolling the window forward and testing each chunk until the window contains the final word in the text. That is your final window.

To state the obvious: many stretches of text will be sampled repeatedly: the last 2500 words in the first window are the same stretch as the first 2500 words in the second window. But moving a window of a few thousand words forward in steps of a couple of hundred words like that really reveals what odd/noteworthy/deviant frequencies of words, phrases, structures and grammatical or typographical phenomena are concentrated in a small portion of the text and, therefore, less relevant to the text/author as a whole. It allows you to distinguish one-off irregularities from repeating regular irregularities that occur in many windows across the entirety of the text.

This paper explains a few ways of doing this. This paper applies one of these methods to look at collaborative writing and to disentangle multiple authors in a single text. (If you want the pdf, ask and you shall receive.)

Instead of having one single measure of an author's language use (one text), you can now break up a single long text (such as a novel) into several dozen or even hundreds of chunks of useful size (usually 1k, or 3k or 5k -- depending on the nature of your corpus). It's a magnifying glass. Much more detailed look.

Doing this manually on an entire corpus is prohibitively time-consuming. The R stylo package has a function rolling.classify which does this automatically. I don't just mean sampling 3k words in steps of 500, but also doing the tests and keeping track of which phenomena are occasional irregularities and which are interesting. Other stylometry software has its own ways of implementing this technique.

This is how that Good Omens experiment was done, btw, with windows of 5000 words. (You need a test corpus of undisputed Pratchett texts and undisputed Gaiman texts first, obviously). When Gaiman suggested Callaway look at Pratchett's Pyramids, too, the method detected Gaiman's input in that book. Importantly, a book that the method had not been trained on. So the process works! It's an attractive method for someone looking to identify multiple author signatures (and where they occur) in a given novel. Of course, your choice of tests is crucial, too.

"I test texts in their entirety, as this eliminates looking separately at these parts that might have more glitches in isolation.

You cannot distinguish occasional / incidental "glitches" from author-specific irregularities and deviances from other texts in your corpus. Besides, this method tests the text in its entirety, too. Just at a much more detailed level of granularity.

You're thinking manually -- how to do these things yourself, by hand. That's just impractical! Computer-assisted methods put so much more power at your disposal. Computers are really good at doing this mechanistic and repetitive process automatically.

Methods that are this impractical to do manually in hand-crafted spreadsheets may seem esoteric and unachievable to you. But I repeat: Science Marches On. We're well past that stage now.

"In the articles I have seen that use chunking, they tend to select oddly different numbers of words per-chunk, instead of consistently using 500 or 10,000 etc. per-chunk"
I'm aware of chunking methods that have a fixed starting point (the 501st word, the 1001st word, etc) but that run to the end of the paragraph.

"this selective choosing makes it difficult to re-test a given text, and it means that some chunks are twice smaller than others, and thus have more glitches"
Really? One text's chunks were twice larger than the other? In the same study? Where did you see that?

By the way, it's not "selective choosing". It's a systematic, mechanistic process of working through the entire text in steps. There's no hidden bias here. Just maximizing the ways in which you can pull information out of a single text.

232Petroglyph
Dic 8, 2021, 11:02 am

>228 susanbooks:
Thanks!

233melannen
Dic 8, 2021, 11:37 am

>219 Petroglyph: Oh, those all look really cool! Thanks for sharing the links. (The last time I messed with analyzing corpora it was for a stunt and I used Phylip, because I was trying to reconstruct the sequence a round-robin game was originally written in, but that was really not what that's supposed to be for!)

234paradoxosalpha
Dic 8, 2021, 11:43 am

>229 anglemark:
My contributions to this thread have certainly been ghostwritten.

235faktorovich
Dic 8, 2021, 11:49 am

>230 melannen: I agree.

236faktorovich
Dic 8, 2021, 1:17 pm

>205 melannen: As I expected, I downloaded RStudio, as you stated in the first step, but after downloading it, when I try to open it in my computer, I receive the message "RStudio requires an existing installation of R in order to work. Please select the version of R to use". So, I had to add the extra step of downloading R: https://cran.r-project.org/bin/windows/base/. Then, because most of the Renaissance texts are only available on EEBO, which only provides basic text content and not the "txt/UTF-8" format, I had to look up the steps for converting Word documents into UTF-8: https://support.3playmedia.com/hc/en-us/articles/227730088-Exporting-a-UTF-8-txt.... After saving a couple of files in the "Unicode (UTF-8)" format, I went back into Stylo Studio to check if these files could be seen by the program for evaluation, and found that the program did not recognize this file type as well as the simple plain-text I had tried before. Thus, I could reach step 8, but could not actually test this software to see if it works and what its results would be like. I have seen similar problems with all of the computational-linguistic programs I had tested in my research phase, and this is why I opted to create my own method and to use simple tools that are available without these complications/ glitches to the public. Do you know what the solution is to Stylo not recognizing the "UTF-8" files? If so, I will try to finish the steps to find out what happens. But this at the very least proves that this system is not for the general public, especially since even the basic steps you provided here are not available in any of the "Help" manuals I have checked in similar programs (I think I tried this program as well, and probably could not get far due to not knowing the special steps, and not having any instructions from the programmers.

237cpg
Dic 8, 2021, 1:43 pm

>217 Petroglyph: "If the journals you submit to don't do that double-blind thing, I've got bad news for you."

In my field (mathematics), single-blind is the standard, not double-blind.

238faktorovich
Dic 8, 2021, 1:44 pm

>215 Petroglyph: "Words that author X uses more often relative to other authors": This is why I use the most frequent words only, as this indeed establishes which words a given author uses more than any other words. Your methodology just turns the term "relative" on its head, by subtracting some words that are perceived to used less or more, or at the same degree out of consideration. My method just records the patterns of the most frequent words and compares them systematically without disqualifying or ignoring any of the words. "If they use it more than the average to a statistically significant degree, then that word can be used as a diagnostic in helping to identify that particular person from the others." Several biased interventions are required for the type of analysis you propose. 1. The choice of texts in your corpus can be manipulated so that "bias" is used with less or more frequency in the "average" for this group of selected texts. 2. How the researcher defines "statistical significance" is not likely to be stated in the article, and so a researcher has intervene to declare some variance from the average as significant while other variations are cherry-picked to be insignificant. 3. Your system invites the researcher to select a few "features" or words or punctuation marks that are chosen for comparison against the average, whereas my system evaluates all possible words and pre-determined tested punctuation marks simply on their frequency.

"Relative counts" becomes nonsensical in a corpus of 284 texts, as creating relativity and average calculations for every word is bound to create nonsensical results.

After some searching, I found the paragraph in the article you are citing that describes the "features" they tested for: "word length, character n-grams, function words, sentence length, punctuation, unique words, vocabulary richness, PoS tags and many more. There are also some new features like symbols, combined-words (e.g. web-based), word endings (‘ll, `ll, ed, ing, ion, ly, n't, s), sentences starting with the word “the” and sentences containing the following words (and, etc, e.g, what, which)." They claim to be comparing an experiment where they tested 446 features to 153 features, but this brief paragraph is all of the features they actually specify. In my experience, none of these computational-linguists respond to emails, unless they hope to be hired to conduct a study. If the point of their article was testing feature-quantity, they had to include the full list of features and to specify what the difference between the features was between the 446 and 153 tests. Their specified features are things that my method tests for as well (word-length, most-frequent characters, frequent words, sentence length, punctuation, linguistic density, mood measurement). The more rarified few features they mentioned are not applicable to the Renaissance where the language variance would create glitches if such tests were attempted. This article does not specify these features because they are counting individual words as features, or have other glitches in their chosen method. Since they say they are using their own software, readers cannot check their findings by running the data through their own software with the same tests. Incorrectly seeing no plagiarism where there is plagiarism with a biased method is far more problematic than simply ignoring the presence of plagiarism in PhD dissertations, so this paper is troubling vs. interesting from my perspective.

239faktorovich
Dic 8, 2021, 1:52 pm

>217 Petroglyph: In one instance, I received an acceptance letter for a central article that explained my computational-linguistic method for the Renaissance into a periodical. However, the editor asked me to address the reviewer's concerns before a final decision could be reached. The "concerns" were nonsensical or made demands for me to insert irrelevant information, to alter my findings with falsified data, etc. The researcher asked me to do a couple of rounds of editing before insisting that I pay him to perform the experiment himself with his own method and to become a co-author on the paper. I refused to do so, and so the paper was immediately rejected, without an option for an alternative reviewer. If you think this is how scholarly peer-review is supposed to work in a society that hopes to make any progress in science/knowledge; you are in the same boat as this editor/peer-reviewer.

240faktorovich
Dic 8, 2021, 2:00 pm

>219 Petroglyph: The concept of "generating n-grams" is nonsensical. An n-gram is any feature x that is being evaluated, like letter and word frequency. Basic counting systems calculate the most common words/letters etc. What is the point of generating letters/words? From what are you generating them, and to do what? My method analyzes any size of a textual corpus, including comparing linguistic signatures that can be visualized as being in their own sketched in corpora groups. So, now you are explaining that UTF-8 files can only be created with this special tool? What's the difference between the Word UTF-8 file vs the one this system creates? Can your Stylo program see one, but not the other? There are basically at least 5 different programs that have to be installed to use Stylo, and it doesn't even work with all of them - and you think this is a working scientific method that is applicable by literature specialists who are not programmers?

241faktorovich
Dic 8, 2021, 2:03 pm

>220 spiphany: You can also read my Re-Attribution series where I explain my method and compare it to the previous computational-linguistic approaches in great detail that should be far easier to understand and far more informative than "Secrets".

242faktorovich
Dic 8, 2021, 2:07 pm

>221 andyl: I have already tested my method in this discuss on 3 known authors (Twain, Dickens, and a third) and have shown that my method works. I have said that if any researcher in this discussion wants to put any larger set of texts to the full range of texts, I would be delighted to run the experiment with me testing the same corpus of texts as them (can be more than 1 researcher using more than 1 other alternative method). I proposed for all of us to post our raw data/ method on GitHub and to link to it in this discussion, and to discuss the results. Nobody has taken me up on this challenge. I have already made all of the tests I needed to on my end. So this experiment would be a test of rival methods, which is something I am curious to explain in detail given the lack of such raw data in published computational-linguistic articles as I have been explaining.

243faktorovich
Dic 8, 2021, 2:09 pm

>222 timspalding: Thank you Tim for clarifying. Given the various types of insults that have been hurled at me across this discussion, I had assumed returning insults might have been a required part of scholarly dialogue in our modern age. It is good to find out this is not the case.

244faktorovich
Dic 8, 2021, 2:17 pm

>223 susanbooks: If you were clearer in your statements about the relationship between Nichols and the attic, I could have addressed your intended meaning, instead of questioning if your attic allusion was referring to Jane Eyre. You are the one who is thinking about attics; it was not something that came into my mind in this context.

Yes, what I am saying that the evidence I briefly browsed in the Guardian's article about the Sotheby's sale is that a forger faked "juvenilia" after their deaths. This is not something I have spent any serious amount of time researching, this is just what the surface facts are blatantly saying.

I have closely read all of the Bronte novels I tested as part of my literature MA/PhD curriculum. "Ironic realist" vs. romanticism in tone: this is how you distinguish authorial style. I have written mystery, romance, fantasy etc. novels that have included romantic and ironically realistic tones; the genre/ tone an author uses does not define their quantifiable linguistic style. Tone is a conscious choice, whereas quantitative linguistic features are unconscious usage patterns that the writer cannot deliberately control. My PhD and first McFarland book are on 19th century British novels, or the "Brontes'" period.

245andyl
Dic 8, 2021, 2:21 pm

>242 faktorovich:

I am sorry but the test you describe in post >124 faktorovich: does not address what I was aiming at. Post 124 seems to infer that for 1 text from Twin, Dickens and Bulwer-Lytton you can distinguish them as having separate authors. That isn't the point of my post at all. My point is can you take the entire corpus (or a reasonable subset) of Conan-Doyle and run it through your tests and does it give one unique author or does it not?

TBH as someone more maths minded I think any process with a binary output is flawed to begin with.

246SandraArdnas
Dic 8, 2021, 2:35 pm

>243 faktorovich: No one, literally, no one threw insults at you. Their post would have been flagged just as yours was

247MarthaJeanne
Dic 8, 2021, 2:39 pm

>182 faktorovich: Have you ever heard of revising, rewriting, making changes ...?

248faktorovich
Dic 8, 2021, 2:48 pm

>227 susanbooks: My MA thesis was about Woolf's Room, and my Author-Publishers book covered Woolf's publishing company. Nowhere in Room does Woolf say that women write only one sentence per day. Your assertion is false, and nonsensical. Room is about a woman having space and time of her own to write full-time as a professional, and this is the opposite of women being forced into Silences. You really have to use specific quotes from these books to support your arguments if you believe your statements have rational meaning behind them.

249Petroglyph
Dic 8, 2021, 2:51 pm

>238 faktorovich:
"Words that author X uses more often relative to other authors": This is why I use the most frequent words only, as this indeed establishes which words a given author uses more than any other words."

No, no, no, no. "Most frequent" is an absolute measure. Which phrase does this text happen to contain most often? In order for that to be meaningful, you'd have to compare the frequency of that phrase in your corpus in general, against other texts.

Most frequent does not mean "most diagnostic". It is sheer raw frequency. Unless you can demonstrate that a particular phrase occurs more than you expect it to, or more than in other texts, more than with other authors, sheer absolute frequency means nothing.

250melannen
Modificato: Dic 8, 2021, 3:08 pm

>236 faktorovich: I'm sorry I missed the download R step! I already had it installed so I didn't realize. It looks like the process of getting it via RStudio was pretty painless though - that's good!

Yes, completely free (no ads, no hidden fees, not web-based) software will often take a few extra steps, and they do have a little bit of a learning curve, but putting in a few days to learn the software is really worth it.

I'm not super familiar with EEBO. I used Gutenberg texts in my examples since I was familiar with that (and there are a lot of early English texts there as well.)

It's odd that a collection of early language corpora wouldn't have a UTF-8 download option! Most of the EEBO repositories I found with a quick google are only available to people with academic credentials, though (super frustrating!) so I can't really help you there.

It's super weird that it would only be available as .doc files though! Those are really bad to use for any sort of stats work, since they're full of irrelevant formatting and fancy encoding that can make it harder to get good data. Word files aren't intended for storing data, they're designed for making nicely formatted documents to print, so they don't work well if all you want is to store data. I'm not sure why it's not recognizing your converted .doc files. It might be something carried over from the .doc conversion.

It's actually a good thing that programs like this are often super picky about file format. File format issues can introduce a lot of bad data into your statistics, so anyone doing this kind of work needs to make sure they understand the way different text formats work and how that can affect the data they're getting. Especially with character-level analysis, being absolutely sure that, say, your file isn't getting confused about ` vs ' and that kind of thing is the sort of stuff that requires a basic understanding of encoding and text file formats.

Were you copy-pasting from a web-only HTML version of EEBO like https://quod.lib.umich.edu/e/eebogroup/ into Word? In that case the simplest thing would be to paste into a program like Notepad instead of Word - Notepad automatically saves as UTF-8 (UTF is just the simplest version of a .txt file, that's why it's so widely used.) You might lose some information that way, though. It would be better to find versions of those texts available that are already in a data-only format (maybe through one of the limited-access version of EEBO?)

251Petroglyph
Dic 8, 2021, 2:54 pm

faktorovich:

The method described in >205 melannen: works. There's a few things I want to add, though.

Rstudio is merely a GUI for R. You need both R -- the actual programming language that does the actual statistical analyses and generates the graphs -- and Rstudio -- which makes the programme look like software instead of a command-line interface, and which has areas for graphs and point-and-click things.

Don't use Word -- machines have a hard time reading that -- too many distortions from plain text. Plain text is where it's at. Test it on Gutenberg texts -- those come in plain text.

Same goes for tables. .xlsx files are proprietary, and mistakes may be made in converting MS Excel files into something more easily digestible by corpus software. .csv or .tsv are standard formats.

Stylo, by default, expects your corpus to be in a folder called corpus.

It expects that folder to be located in your "working directory", which is the default directory that R uses to look for files you tell it to read and where it puts the graphs and the other files it generates. If you're not sure which folder is your working directory, type getwd() ("get working directory"). This will tell you what folder R is in, and where it will look for a folder called "corpus". Put your corpus folder there.

If you follow the steps in >205 melannen:, R will create a graph for you (I'd suggest selecting a PCA under the "statistics" heading, with Classic delta as the distance). It will also dump into your working directory a word list of all the tokens in your corpus; a table_with_frequencies.txt, which gives the frequencies for all the word tokens in your corpus per text; and a stylo_config.txt file, which lists the configurations: the dimensions of your graph, which analysis was performed, the distance measure, and much else besides. If you want to recreate a graph / analysis: here are all the parameters you need.

252Petroglyph
Modificato: Dic 8, 2021, 3:08 pm

Alright. Time for a Lunch Break Experiment.

I've used the following texts from ProjGut (selecting the ones without illustrations -- just plain text):
Austen: Emma, Mansfield Park, Northanger Abbey, Pride and Prejudice, Sense and Sensibility, Lady Susan.
A Bronte: Tenant, Agnes Grey
C Bronte: Jane Eyre, Professor, Shirley, Villette
E Bronte: Wuthering
Marie Corelli: Romance of two worlds, Sorrows of Satan, Young Diana

To state the obvious: I am aware of just how ridiculous it would be to compare these books for authorial attribution purposes (regardless of what you may think about the Brontes). This is a fun little exercise (well, fun for a nerd like me), a totally-not-serious playing around with some software with no pretentions to any solid conclusions!

I removed from these texts all of the Gutenberg legal text at the end, the Gutenberg text at the beginning, any prefaces, preambles, introductions, tables of contents. All texts start at Chapter 1. We're comparing fiction by these authors, not their introductions to various editions, or other people's prefaces. Properly speaking, I ought to also remove all the chapter headings from the text, and preferably any poetry etc. But for this Lunch Break Experiment, that's too much.

Austen: early 1800s, Brontes: mid-1800s, Corelli: romances from the late 1800s.

A zip of the Austen texts is available here. A zip of the entire corpus (Austen + others) is here.

In part one: How do Jane Austen's novels compare to each other? I expect that Lady Susan would be the outlier: it's the only epistolary novel (i.e. written in the I-form).

Here is how they cluster a cluster graph that I got by following the steps in >205 melannen:. I selected jpg under the Output tab, and Cluster Analysis under Statistics.

So. Northanger and Mansfield are fairly close; and so are Sense and Pride. These two pairs are more similar to each other than Emma. Lady Susan is hella different.

How does this look in a scatterplot? Again, using >205 melannen:, but with selecting PCA (corr) under Statistics.

Once you have a measure of how different these novels are from each other (or how similar), you can translate this difference into distances. You can then plot the distances between these novels on a single line, or on two, or on three. This graph uses two lines (the x-axis and the y-axis), each of which represents a different kind of difference / distance. It is up to the researcher to determine what these distances measure.

The closer two texts are in this graph, the more similar (or, properly, the more not-different) they are.

The percentages on the axis labels are how much difference is covered by them. The X-axis explains about half of the difference between these novels. The Y-axis almost a quarter.

The distances between the novels along dimension one, the x-axis, is mainly about separating Lady Susan from the others. This is clearly the only first-person novel that is massively different from the others (because of the difference in pronoun use). I can't really tell what the second dimension represents. Why are Sense and Emma furthest apart? Mans and North cluster together, and Pride is closer to them than to Sense. Off the top of my head I can't think of an explanation. But I haven't thought about this, at all. And I'm not going to spend more energy on this, either. The distortion caused by Lady Susan may be making other things less visible. If I wanted to, I could remove Lady Susan from consideration, and try again. But that would take me too far.

This is why you have to be careful with which pronouns or which function words your model tests for! This is why it is important to be able to remove certain distorting characteristics from consideration! This is why it is important to make sure that you are comparing apples with apples, or that you are comparing apples with oranges with the appropriate tests

253Petroglyph
Dic 8, 2021, 3:01 pm

Part two: let's plot the Austen novels, the Brontes' novels, and three of Corelli's novels together. Ridiculous, I know. Are they similar to each other? Different? Will Austen and Corelli stand out as separate authors? Will the Brontes? Why would that be the case?

Here is a cluster graph (same specifics as the Austen cluster graph):

Interesting!

Austen's novels (in green) form a neat cluster (apart from Lady Susan, for reasons already discussed).
Charlotte Bronte's books (in black) are more similar to each other than to other texts. Wuthering (in grey) is most similar to Charlotte's work.
Corelli's books show up as their own cluster -- these texts are more similar to each other than they are to the others. There is a separation here between Romance and Sorrows on the one hand, and Diana on the other. The former two are third-person; the latter is first-person.
Most interestingly, perhaps, is that Anne Bronte's work (in red) clusters with Lady Susan. The Anne books are closer together than to Susan, but still. And wouldn't you know it: both of Anne's texts are first-person narratives!
This cluster graph thinks that these books fall into two main groups: Anne Bronte's texts, which are most similar to Jane Austen's; and Corelli's texts, which are most similar to the C and E Bronte.

So, the similarity between the Corelli books is much stronger than the separation third-person vs first-person -- her first-person Diana is not included with the other first-person cluster on the top of the graph. In other words: Corelli really has her own style that is very different from the others. (Spoiler alert: Corelli is famous for endless purple prose.)

Finally, I want to make this obvious: this graph does not prove, demonstrate, show, insinuate, hint at or imply that Austen wrote the Anne Bronte books. If pronouns were removed from consideration, these clusterings would look fairly different -- we've seen how much of the variability among the Austen novels is explained by first-person vs third-person. We'd have to run this thing again, without pronouns. But I leave that as an exercise for the reader.

Equally obvious, perhaps: There is almost a century between Austen's work and Corelli's. That is bound to have an impact.

Okay, on to the pca graph for these novels. Again, the only defaults I changed from the steps in >205 melannen: is that I chose PCA (corr) under statistics, and jpg under output.

So. The x-axis is mainly about separating the Austen books from the texts by Corelli and the Brontes. 35% of the diversity across these texts is covered by this separation. Is this because Austen's books are older than the others and the software is picking up differences between English-around-1800 and English-around-1850? Corelli deliberately affects archaic language, so that would explain why her books are closer to mid-century texts than you'd expect them to be. Perhaps there's an aspect of class, as well. Either way, I've gone over my allotted lunch time!

The Y-axis (which covers about 16% of the differences between these texts) is mainly about separating Lady Susan from the rest of Austen's work -- another confirmation that this text must be enormously different from the others. Corelli's texts fall below it; the Bronte texts above it, except for Shirley, which is a little closer to Corelli's texts.

Anne Bronte's texts (in red) are, again, closer to Austen's than the other Bronte texts, but not nearly as close to them as they are to her sisters.

Roughly, this graph puts Austen's texts on one side, groups the Bronte texts close-ish together, and finds a separation between Corelli's texts and the Brontes' (apart from Shirley).

Better grouping/clustering/distancing measures and a more fine-tuned analysis could perhaps turn this into a fun little blog post. But I've got to go back to work.

254LolaWalser
Dic 8, 2021, 3:04 pm

>119 faktorovich:

Thank you.

>143 Petroglyph:

And thank you, that's illuminating. I understand about the features being hard to fake but I think the real question is the one I note below, off a remark in >244 faktorovich::

Tone is a conscious choice, whereas quantitative linguistic features are unconscious usage patterns that the writer cannot deliberately control.

Not to be a smartass, but, Queneau, Exercices de style... handily disproves this. However, I understand why we'd assume most authors don't bother tinkering with such "invisible" parts of their style as interpunction, contractions etc. That's not (probably) much in question.

What is, as far as I can see is this--do those "quantifiable" aspects of style form absolutely unique patterns? From everything I've read here I'm tempted to conclude "absolutely not". When I think of applying this methodology to the type of writing prevalent in my field, I fear that the conclusion may well be the entire output is created by one, maybe tops two people. And there would be good reason to think so based on the neutral, boilerplate, standardised language we aim at--nonetheless, the conclusion would be completely wrong.

255Petroglyph
Dic 8, 2021, 3:05 pm

>237 cpg:
Interesting! Still blind, though. No bargaining with the referee!

256melannen
Dic 8, 2021, 3:06 pm

>253 Petroglyph: Thank you for doing all this! I'm learning a lot.

257faktorovich
Dic 8, 2021, 3:06 pm

>231 Petroglyph: No, the quirks of individual chunks do not disappear when you combine them with many other chunks; by combining or averaging them you simply create an output that is entirely nonsensical or does not represent the true authorial-signature in the text as a whole. In contrast, when you just test the text as a whole, you arrive at a precise attribution because you are testing the patterns that emerge when an author writes thousands of words (in this pool those minor glitches in the parts disappear). This is especially the case if a researcher selects chunks just as politicians in the US gerrymander congressional districts so that the final result is more elected officials that favor the gerrymandering-officials preferences vs. the average political preferences of the entire population.

Your point about rolling the window by re-testing some of the words many times while testing some of the words only once shows extreme bias and an extreme escape into statistical nonsense-land. The point of this "rolling" seems to be to create completely unrepeatable without the Stylo program experiments, as you say it is a method that cannot be duplicated with basic tools/math. By using a method that selects chunks in this unrepeatable manner, and in a format that cannot be shared in its raw form, researchers create an excuse for why they are not sharing the raw data, and thus they have the option to manipulate the data to give the conclusions that re-affirm current attributions, even if carrying out the test instead created nonsensical or entirely inconclusive results.

I explain the method for distinguishing collaborative writing in my Re-Attribution book. I reviewed all available scholarship on this topic as part of my research for this section.

...You are mansplaining how computers can be used to do some things but not others? That is pretty rude, and irrational given the depth I go into in my explanation of my computational-linguistic method and rival method in my Re-Attribution book that you still have not attempted to read. I am familiar with the available tools, and the computational tools I chose are the best ones to actually solve attribution mysteries. It is irrelevant how these basic working tools sound or seem to insiders in this field.

I have a filmographic memory, and I recall seeing a data set with differently sized chunks in several studies, but I do not recall the names of these studies. You can figure out which ones I must be thinking of by actually reading by Re-Attribution book, as I recall citing all of these articles in the annotations.

258faktorovich
Dic 8, 2021, 3:10 pm

>245 andyl: Yes, the 284 texts in my Renaissance corpus, the 21 texts in my 19-20th century corpus, and the 100 or so texts in my 18th century corpus have already been tested to prove unique authors are correctly identified. As I have repeated, I am open to running a brand new extensive experiment on any accessible group of texts, if whoever is asking me to do it is familiar with the "standard" computational-linguistic methods and will run and share all of their data as well as me. I don't know why you guys keep challenging me without somebody stepping up to agree to accept the challenge on your end.

259faktorovich
Dic 8, 2021, 3:11 pm

>247 MarthaJeanne: I spent a couple of years revising, rewriting and changing my study and the books in the Renaissance series before publishing these 14 volumes. Are you insinuating that despite being a literature scholar, I do not know the meaning of the term "revising"?

260norabelle414
Modificato: Dic 8, 2021, 3:18 pm

>252 Petroglyph: , >253 Petroglyph: This is all so lovely and fascinating! I'm sorry it's clearly falling on deaf ears (though the rest of the audience is rapt.) At this point you would certainly deserve a co-authorship if any article got through peer review.

261MarthaJeanne
Modificato: Dic 8, 2021, 3:20 pm

No, I was confused that you seemed to think that someone would only go through the text of a book once during the writing of it.

262Petroglyph
Dic 8, 2021, 3:25 pm

>254 LolaWalser:
"What is, as far as I can see is this--do those "quantifiable" aspects of style form absolutely unique patterns? From everything I've read here I'm tempted to conclude "absolutely not". When I think of applying this methodology to the type of writing prevalent in my field, I fear that the conclusion may well be the entire output is created by one, maybe tops two people. And there would be good reason to think so based on the neutral, boilerplate, standardised language we aim at--nonetheless, the conclusion would be completely wrong"

If you mean, completely unique, such that a (long) text written by you can be uniquely attached to you and to no-one else? ... Possibly? If we found similar levels of specificity for everyone else... Maybe? That would require a ridiculous degree of specificity, and absolutely mind-boggling amounts of time.

These methods are used in cases where you suspect Text X may have input from Author A, but some from Author B, too. So, you gather some of Author B's texts, some of Author A's, and you compare them. Which measures are sufficient to separate them? Those are the ones you use. Enter Text X. Plot it against the clusters that are formed by Author A and Author B's texts. Where does it fall?

Point being: you really only look at texts and authors where you have reason to suspect shenanigans. Comparing text X with others until you find something similar (according to poorly-implemented tests) isn't likely to generate valuable author connections. Though stranger things have happened! Perhaps J.K. Rowling has published books under other names than JKR and Galbraith, and someone might find out some day!

I'm not sure if things like, say, boilerplate contracts (to take an example. I'm not implying you work in this field) with variation in some clauses where lawyers insert situation-specific agreements using standard legal language and phrases would show up all that differently. Or even tailor-made contracts which again use standard legal language and makes use of standard clauses. You'd be surprised at what an unrealized preference for certain determiners and punctuation marks can reveal. But texts like these are essentially designed to be indistinguishable from each other. And contracts go through various rewrites by different lawyers, too. I'm not aware of any studies here, but then again: stylometry isn't really my field, though some of my colleagues do research that could be classed under that header. I'm sure it would provide interesting challenges.

263faktorovich
Dic 8, 2021, 3:28 pm

>249 Petroglyph: Yes, that's what I have done, compared the most frequent 6 words/letters in each text to the 6-most-frequent in each of the other texts in the corpus. As I also explained, I used phrases merely to show obvious patterns of only a single author using a given phrase and not any of the others; I did not compare phrases across all texts, but rather only the words and letters.

264Petroglyph
Dic 8, 2021, 3:34 pm

>258 faktorovich:
"I don't know why you guys keep challenging me without somebody stepping up to agree to accept the challenge on your end."

You are the one proposing a model. You are the one proposing a massive, unseen, gob-smackingly unexpected revision of centuries of authorship attributions. You claim your methods work.

The burden of evidence lies on your shoulders. It is not my responsibility (nor anyone else's) to prove your model right, except yours. It is not anyone's responsibility to prove not-your-model wrong. That is all yours.

Even if you, shouldering your responsibility, could prove that not-your-model was wrong -- that does not make your-model true. You'd have to demonstrate that separately.

Arguing against the status quo to such an extent as you're doing entails doing more heavy lifting than proving a smaller adjustment of the status quo. You have to convince, well, not just us anonymous LT users, but a scientific community.

This is your burden to bear. Don't complain that we don't share it. Because it isn't ours.

265LolaWalser
Dic 8, 2021, 3:48 pm

>262 Petroglyph:

Right, standardised writing is just a limit-case--my point was that completely ignoring context and relying only on analysis to assert how many authors are involved obviously CAN lead to wrong conclusions. But between that and unique styles (if such ever exist, i.e. if there really are any writers who write like no one else ever did or will) there's a lot of terrain where I imagine different people might share to various degrees either "quantifiable" or subjective markers of style, or both. Sufficiently so that a limited analysis might not reveal true multiplicity of sources.

Point being: you really only look at texts and authors where you have reason to suspect shenanigans.

That makes sense to me because it presupposes some basis for comparison. If we assumed for a moment total cultural amnesia, such that we lost all knowledge of authors and were left just with separate texts, it seems unlikely that a purely quantitative analysis could correctly re-establish authorship.

266spiphany
Dic 8, 2021, 4:21 pm

>241 faktorovich: I think you misunderstand me. Why would I want to read hundreds of pages of your writing when you haven't managed to successfully communicate your ideas either in the blog post interview or here?

I'm fully aware that the type of information and the amount of detail in a scholarly study is different than in a shorter summary for a non-specialist audience. Nevertheless -- as I'm sure you know, being a publisher yourself -- a clear and concise summary is an invaluable part of presenting one's work, and an absolutely essential skill in academic writing. It's how you (generic "you") convince audiences, who generally consist of busy people with a long list of potentially relevant literature that they need to decide whether to read, that your work, rather than someone else's, is worthwhile spending their time on.

By the way, the title of Pennebaker's book is not "Secrets" but rather "The Secret Lives of Pronouns", referring not to any incomprehensibility or secretiveness of his writing, but rather to his argument that analysis of inconspicuous function words like pronouns and articles can reveal unexpected information about the speaker/writer.

I'm guessing from your response that you aren't familiar with it; I mentioned it because it is one of the few titles I'm aware of on the topic of stylometrics that is written for a general audience. As I said, I don't agree with all of his arguments or conclusions, but I found it thought-provoking and thought it might be of interest to other readers of this thread.

267Petroglyph
Dic 8, 2021, 4:25 pm

>263 faktorovich:
Look. The most frequent words in an English-language text of any large size are bound to include these: the, and, for, to, a, was, .... Having many of these in your text is meaningless: everyone does.

It follows that combinations of the most frequent phrases ("I don't", "I can't", ...) are meaningless, too. They are merely combinations of frequent individual words. They are to be expected.

What is interesting is relative frequency: which words does this text / author / genre / whatever use more often than the baseline? Those are likely to be meaningful.

Even the tool for "unusual words" on one of your online tools requires you to paste in a "usual word list". "Unusual" words or phrases are those that are used more by a certain author compared to other authors. Do you even have such a list? Would you be able to generate one? I've linked to corpus analysis software in >219 Petroglyph: that would help you extract absolute word frequencies from a representative corpus.

Look. I pasted a link to a .txt version of Pride and Prejudice into your text analyzer. These are the top 6 three-word phrases:

i am sure (62)
i do not (59)
as soon as (57)
project gutenberg tm (56)
she could not (51)
that he had (37)

This does not mean that any of these phrases are diagnostic of Jane Austen! This does not mean that they are diagnostic of P&P! This does not mean that other texts that have these common phrases show Austen's ghost-writership. Look at them! They are combinations of some of the most common words (from the top 50, by the looks of things). It is exactly these kinds of phrases that are entirely expected in novels -- note the past tense (could, had). If this text had less dialogue, phrases such as "I am sure" or "I do not" would not nearly be as high. If Pride and Prejudice were written in the first person, that sixth phrase ("that he had") might have been replaced by a phrase containing "I".

That does not mean that Austen's signature style would have changed. Signature phrases, words, punctuation use etc. are the ones that unexpectedly common.

Also: I deliberately left in that project gutenberg tm. This shows you that phrases can be frequent, but not typical of an author. They can be frequent for reasons other than authorship. (A book with four volumes and, therefore, four occurrences of "chapter one" would not mean that that phrase is typical of that author!)

If we were to paste your comments in this thread (or one of your books) into a .txt file and feed it to that text analyzer, would the top 6 phrases be diagnostic of you? Go on. Do it. Tell me which phrases are "yours" based on one of your books.

I'm not denying that repeated occurrences of three-word phrases can be indicative of an author. Sure they can. But they must be three-word phrases that an author uses uncommonly often, as compared to a corpus of similar texts in similar genres by similar authors of similar classes, genders, ... Those are useful. Those can be diagnostic. Those can reveal author preferences.

First-person narratives vs third-person narratives will show differences in the "top whatever phrases". The point of identifying an author's style is precisely to look at things that are independent of first vs third person!

Now I know that you have not "analyzed" these three-word phrases while establishing your authorship re-attribution. In >263 faktorovich:, and in >119 faktorovich::

"The 3-word-phrases (the top 6 most-frequently-appearing of these out of all possible 3-word phrases in a given text) I pointed out as revealing in the Brontes' case are not one of the 27-quantitative tests involved in the basic method. I collected these 3-word-phrases for all texts and used them to find obvious patterns of phrases that only appear in the work of any given authorial-style. The use of a contracted vs uncontracted "I can't" is a significant stylistic divergence that is revealing when one of these appears among the top-6 most-common phrases because this means there are many instances of this preferred usage in the text vs. the alternatives. I did not use any 3-word-phrase patterns to establish attributions, but just used them in the writeups to show that the less easily understood mathematic matches vs. non-matches were also confirmed by these verbally-descriptive elements."

What you've done is a) run your tests, come up with some author profile. b) you've then looked at the three-word phrases from that author (or that text), and you have taken the most frequent phrases by absolute frequency as equally diagnostic of an author's style.

This is bonkers.

You've looked at the wrong thing. You've misinterpreted that thing, and attached much, much more importance to it than you should have. This is a mistake. Any conclusions you've drawn on the basis of these phrases has to be revised.

268faktorovich
Dic 8, 2021, 4:28 pm

>250 melannen: A few days are not going to help me learn anything, if there are no steps you can think of to overcome the problem I explained I faced in using this software. There is no manual explanation for this problem in their system.

Here is a copy of an EEBO text - Percy's Coelia: https://quod.lib.umich.edu/e/eebo/A09307.0001.001?view=toc - this text is not available on Project Gutenberg; Renaissance texts that are on Gutenberg are mostly translated, and so are not comparable to original-spelling versions from EEBO, or useless for my experiment. EEBO is in the public domain and accessible to everybody for free. EEBO has plain-text websites that can be copied into Word files or other programs; the main text-processor I use just happens to be Word. Word allows for easy creation of plain-text files and other standard file formats (including UTF-8, but just not the type Stylo sees). No the standard file formats do not introduce any errors, unless the researchers has made a ridiculous bunch of errors in choosing a format etc. that generates errors. I just tried saving Coelia in Notepad as a UTF-8 and while saving the file in this format worked, Style definitively does not see/ recognize the UTF-8 format that was thus created. In other words, Stylo needs some other file format to work, and if you don't know what it is; then, you cannot have used this system yourself in application to real texts.

269nonil
Dic 8, 2021, 4:35 pm

>268 faktorovich: It seems far more likely that there's a setup error with Stylo, rather than melannen having for some reason lied about the file format needed for Stylo to work. Are you sure your corpus folder is in the right place (as explained in >251 Petroglyph:)? That's the first thing I would check, if Stylo isn't "seeing" the files at all.

270faktorovich
Dic 8, 2021, 4:36 pm

>251 Petroglyph: I changed the folder name to "corpus" and Stylo still did not see the files in it in the UTX-8 format. Now you are adding a step to your method that makes no sense with the previous steps you listed. You said that I just type "Stylo" etc., and then open any folder in my computer where the corpus is. Why would there be an option for me to choose a folder, if the system only recognizes files that are in a specified directory? I tried typing "getwd()" in Stylo and in my computer's command options and nothing happened. You want to create a new set of steps that includes placing files in this specific directory, and how to find them etc.? Otherwise, nobody can follow existing steps to get any kind of results with this method. And then you are adding new steps in your last paragraph that you are now saying are needed to get statistics that were not in your initial list, and these steps cannot be applied without far more precise steps as to where/what is being chosen, and how to generate these components. Your method is not practical and the glitches we are discussing should not be there if this tool was as popular as you have claimed.

271melannen
Modificato: Dic 8, 2021, 6:46 pm

>258 faktorovich: >I don't know why you guys keep challenging me without somebody stepping up to agree to accept the challenge on your end.

Hey, I'm going to be really frank here: nobody is taking up your latest challenge because they don't have any reason to.

Firstly, because almost nobody in this thread is really that invested in whether your results are true or not? Some of us are passionate about the idea of outsider science in general, and it makes us sad when someone presents themself that way but has clearly not done the basic foundational work to know what they're doing. Some of us are just unrepentant pedants who can't stand someone getting even the smallest facts wrong without correcting them - and you're getting a lot of small facts wrong. Some of us are just enjoying a good internet argument about something with no stakes whatsoever - it's a nice change. Some of us don't like the idea that LT gave their platform to research that doesn't stand up to fairly basic scrutiny - but that's a problem we have with LT, not a problem with you; nothing you do can fix that problem because it's not a you problem, it's an LT problem.

None of those reasons are the sort of reasons that would make us want to test your method, because none of them really have anything to do with whether your results are valid or not. We would still have the same problems with what you are saying here even if your results were proved 100% correct, because the problem is not your results! It's how you're going about backing them up, which is poorly, and with clear evidence that you don't even understand why you're not backing them up well.

Secondly, because people in this thread have taken up your challenges. You challenged me to give you simple instructions on how to do a reproducible test using Stylo. I provided them. You followed them halfway, then stopped following them and tried to substitute your own data (which wasn't covered in the instructions) and claimed that meant I was wrong about being able to write simple instructions, because my instructions didn't work when you changed some things. This is a super common pattern with people who try to present unusual claims - they will challenge their opponents to do something, and when the opponents do that thing, they "move the goalposts" and present a new challenge which they then claim nobody will try. It's really easy for someone who has seen it before to notice when someone is using this strategy (you are, if you hadn't noticed!) and that makes it obvious that there's no point in undertaking the challenge, because no challenge would ever be good enough to settle things, there will always be a change snuck in or a newer, better challenge.

Thirdly - it's work! I wrote out those instructions because I wanted to play with R and it motivated me to do that. But something like trying to reproduce your whole experiment would be a lot of work, as I'm sure you know! None of us here care enough about attribution of early modern english texts to want to put in the work you've put in - you're the one who's passionate about it! I think the topic is interesting enough that I'd really like to see you approach it with better methodology, but if I'm going to do that much work on stylometry and ghostwriting, I'm going do Stratemeyer books, because those are the ones I care about. Petroglyph has actual paid work they need to be doing instead. Testing your work isn't our job.

272Petroglyph
Modificato: Dic 8, 2021, 6:55 pm

>270 faktorovich:
What is your working directory? Type "getwd()" (without the quotes).

Make sure your "corpus" folder is in that working directory. It should only contain the files you want to look at. Preferably al in .txt format.

Are the files you've saved in .txt? UTF is a character encoding used in plain text, so your files should be .txt.

Don't uses MS Word (or libreoffice or google docs or whatever) as an intermediate. These apps save texts in other character encodings, which may mess with the UTF-8 encodings as you save them as .txt files.

If you copy straight from web pages, please copy straight into a notepad file. Then save as .txt.

If you must use other formats, you can. Look at the first tab when you run Stylo to see what it accepts. It accepts html files.

On this page, I rightclicked on the html text link and selected "save as", then saved as html page. Not html, complete -- that saves a folder with images alongside the page. Just "save as", then select "html only".

I did the same on the equivalent page for Sense and Sensibility.

If you paste these two files into your corpus folder, and then have stylo run, you can select (on the first tab) the option .html, instead of .txt. Running a PCA with all the defaults on those two files gives you this:

273bnielsen
Modificato: Dic 9, 2021, 4:11 am

>271 melannen: Very nicely formulated, IMHO.

I like mathematics and I have quite a few math books including some rather silly ones. One of them is written by a medical doctor (x-ray specialist) who wrote a book about patterns and prime numbers. It got a rather unfriendly review by the readers and one of them linked to this page: https://primes.utm.edu/notes/crackpot.html
Please notice that I'm not trying to insult anyone but just pointing out why announcing unusual results is risky business for anyone. Should I one day wake up and have a proof of the ABC-theorem in my head, I hope I'll go through the crackpot.html check list before calling national media.

ETA: A bit of the x-ray doctor's stuff: https://mersenneforum.org/showthread.php?t=6409 which triggered the crackpot alarm.

274lilithcat
Dic 8, 2021, 6:32 pm

>273 bnielsen:

this page: https://primes.utm.edu/notes/crackpot.html

Okay, that page is awesome. And nearly all of it applies in fields other than mathematics. Love it.

275Petroglyph
Modificato: Dic 8, 2021, 6:42 pm

>273 bnielsen:
Neat!

>260 norabelle414:
Cheers!

276Keeline
Dic 8, 2021, 6:44 pm

One of the problems with letting Microsoft Word anywhere near a text for machine processing is the default behavior of converting "straight quotes" (ASCII 22 ") to typographers' or "curly" quotes (“ ”). This also applies to apostrophes dashes etc.

These can be saved into a text file with a UTF-8 character set but the two types of characters are not equivalent to the computer and will confuse software trying to process it.

You can turn this behavior off but most do not. When I have been involved in web applications there has been no end of problems with users doing a copy-paste from Word because of these defaults. Often it is necessary to come up with a series of search-and-replace routines in the code to catch them to achieve a "paste from Word" function. Yet, whatever you think to include, there will be some other one that will not be caught. It is like a child's game of Whack-a-Mole.

Even pasting to Notepad or most text editors will not help because the characters are legitimate but they are different.

Software like BBEdit on the Mac can run a routine to "straighten quotes" but this doesn't touch the em-dash, etc.

James

277Keeline
Dic 8, 2021, 6:58 pm

I saw some reference to post-1926 texts being off limits for the study. This sounds like a simplified rule concerning U.S. copyright law.

As with most things, the reality is more complicated.

As I recall, the 1927 texts don't become public domain in the U.S. until 1 Jan 2023. Thus, even the 1926 texts would not be available until 1 Jan 2022.

Up to a certain point there was an opportunity to renew copyrights beyond the first term of 28 years for works for hire under the 1909 law. In the year following the 28th after copyright and publication, the owner (publisher, author, heirs) could renew the copyright and get another 28 years of protection (56 total).

Extensions to the Copyright laws in the 1970s and 1990s have taken us to our present level of 95 years for works for hire. I find them to be absurdly long in an era when there is pressure to make 20-year patents shorter.

The result of this is that for works first published between 1923 and 1963 that if they were not explicitly renewed (with a record that can be found from the Copyright Office), the work is now public domain.

This can happen with publishers that went out of business like Saalfield who had the copyrights for their books but were not around when the renewal time came up. Some authors and even publishers were sloppy about renewing their properties.

So, in an effort to prove a negative, you look at databases like the Stanford Copyright Renewal database (based on the initial efforts of a Google engineer to accurately digitize published renewal records). It usually pays to search several ways to see if a work was renewed. If it was not, there is an excellent chance that it is now public domain.

Using this method, I have been able to convince groups like HathiTrust.org or Google Books to make certain works available in full view on their systems. Once this is done, it is usually a matter of time before they are made available on Gutenberg.org for a text that might be suitable for textual analysis through stylometric software.

Thus, there may well be texts after 1925 that are available for analysis.

My field of specialty is juvenile series books and I am writing an extensive Series Book Encyclopedia which deals with nearly 1,300 such series and thousands of volumes. I have noted which of the 1923-1963 works appear to be public domain since this provides a clue about which works might be appropriate for digitization projects.

James

278melannen
Modificato: Dic 8, 2021, 7:17 pm

>268 faktorovich: If you can't learn things, there's no point in any of this. A few days won't make you an expert in a new piece of software - but they will give you time to work through enough tutorials and free video courses to know where to start. (Like I said! I wrote those instructions with half an hour and google! I am *not* a stats expert - I'm just someone who is willing to try learning things, even things I don't immediately understand.)

>>No the standard file formats do not introduce any errors, unless the researchers has made a ridiculous bunch of errors in choosing a format etc.

I am trying to tell you that somebody using Word to prepare text data for statistical analysis *has* made an error in choosing a format. Don't use Word for anything other than preparing a text for printing! It's not good at other things! More importantly, using it is going to broadcast to anyone who does this work that it has ridiculous errors!

I didn't go into detail about text encoding (which is different from file format) in my original instructions, because that wasn't the goal of the instructions - the point was to demonstrate that anyone could do a simple Stylo run in fifteen minutes from scratch, with basic instructions. I'm not going to go into detail about text encoding here, either, because that is basic stuff anybody working with digital text files should know, and, again, if you're willing to try to learn stuff you don't know, it's easy to find internet resources about it. (Is it simple? At the beginning it is, but it can get super complicated. However it's not something you can overlook because it can matter a lot to how valid your results are! If you'd come at me with justifications of why you'd chosen word and how you'd carefully compensated for its shortcomings, I would be with you all the way. But you didn't, you came to me not even knowing what UTF-8 is.)

I am going to try to figure out why your Stylo test isn't working though because I'm a sucker for an unsolved problem.

(First off, the texts on that website aren't plain text. They're xhtml/sgml. It says that right on the "About" page. Your web browser makes that xhtml look like formatted text, because that's what web browsers are for. When you copy it and paste it somewhere else, some of the information in the xhtml about how to display the text is going to change, because it won't be xhtml in a web browser anymore. Some of those changes may make your stats results inaccurate. But that's all beside the point really.)

Anyway, I copied-pasted the Coelia text to Notepad and saved it as Coelia.txt, which made it a UTF-8 txt file. And then I followed my instructions 4-8 exactly. It gave me an error because the point of that test is to compare texts to each other, and you can't compare one text to itself. Is that your problem? I copy-pasted Faustus from that website too, saved that to the Corpus folder from Notepad, and tried again, and now that it had two things to compare, it worked.

If you had two or more .txt files to compare, maybe the problem is that you had other files in the folder as well? My instructions didn't mention that you couldn't have any other files in the Corpus folder, because they told you to create a new folder from scratch. If you have any files in Corpus other than .txt UTF-8 files, delete them, and then try again.

If neither of those things work, I can't say. I can say the problem isn't the way you're converting to UTF, because that works for me.

(Stylo doesn't actually require UTF-8 by the way, That's just what it assumes you're using if you select "plain text", because it's the most common plain text format. If your originals are xhtml files, then you should select "html" instead. The options are right there! It does require you to know the difference between html and plain text, though.)

279Petroglyph
Dic 8, 2021, 7:17 pm

>278 melannen:
👍

280faktorovich
Dic 8, 2021, 8:59 pm

>254 LolaWalser: The description of "Exercises" explains that it varies genres such as the "sonnet" and "mathematical formula". Genres are entirely conscious choices and can obviously be manipulated by a scholarly writer who is familiar with their formulas. I don't know why the author has called this "style", but perhaps something has been lost in the translation. There is no question that authors intuitively have an aversion or an attraction to exclamations, and have rhythms of punctuation that feel natural to them while others don't. The patterns are only entirely unique when a large number of different elements is measured as is the case with my 27 tests. While even the exclamation rate can vary between texts the combination of multiple elements is still going to register the underlying author's signature. If you follow the simple steps I provided, your conclusion will be entirely right, unless you make unpredictable selection-bias choices in building your corpus or make mistakes with the process. Let me know if you want me to look over your data after you attempt an experiment.

281prosfilaes
Dic 8, 2021, 9:08 pm

>277 Keeline: Using this method, I have been able to convince groups like HathiTrust.org or Google Books to make certain works available in full view on their systems. Once this is done, it is usually a matter of time before they are made available on Gutenberg.org for a text that might be suitable for textual analysis through stylometric software.

With my English Wikisource admin hat on, may I point that we're here too, producing transcriptions of works from HathiTrust and Google Books?

282faktorovich
Dic 8, 2021, 9:08 pm

>264 Petroglyph: You are saying that the history of Britain is my burden alone. Not you, or any other scholar, but me specifically. I agree. Absolutely. I am working on testing the set of texts you ran through your experiment, and I will post the information I gather when it is finished. I am just waiting for the software to finish processing, and thought I'd respond to a few other posts.

283faktorovich
Dic 8, 2021, 9:14 pm

>265 LolaWalser: If you only test word-frequency or any other single measure, it will become difficult to distinguish between styles. But fingerprint analysis works because there are many ridges and curves and other unique features, all of which jointly create a unique set of features that is extremely rare and appears in as few as only one human on the planet. This is why I use 27 different tests in combination. We should measure more texts with computational-linguistics, and we surely should not only test the mystery-texts without comparing them with a significant number of other texts in the surrounding canon to check if other texts might not seem like mysteries, but are, or who out of all likely authors is the real underlying author. Instead too many computational-linguists start with an assumption they have guessed who the author is and only test these assumptions on a handful of texts/bylines.

284prosfilaes
Dic 8, 2021, 10:35 pm

>283 faktorovich: Looking at more information is fine, but calculation without calibration is always a danger. As I said, it's easy to use calculations to establish that every one of those 284 works has a different style and hence a different author, or different calculations to show that every work is written in the same style which can be labeled as English (as opposed to those who write in a style known as French or Spanish or Estonian or Swahili.)

If you assume that most of the works are correctly labeled and are trying to figure out certain anonymous works, it's easy to check that your calculations are producing reasonable answers. The further you go away from that assumption, the harder it is to say anything. If you know nothing about the authors or the works, >253 Petroglyph: could be based on the works of 16 different authors or 2 or 1, depending on how much distinction between the styles you think differentiates an author.

Computational tools can't tell how much variation you can expect from a single author, or how much difference you can expect between author's styles, or even if they're a reliable tool for separating authors at all. And it could very well be different for Renaissance English than it is for modern English, given the reduced cultural variation and different literary forms. When you say there's six ghostwriters, that's purely bias; someone could look at the same data and come up with more authors, or possibly even less. Like many have done before, you're tossing around numbers and coming up with answers that match your bias, and not admitting (or possibly even understanding) the role your bias played in getting the answers you got from the numbers you got.

285lorax
Dic 9, 2021, 9:46 am

You say your data is available on github, but what really matters is your code. I don't know anything about that period of English literature, but I've been doing data science since before it had that name, and can comment on methodology. In particular, I'm interested in:

* Is this a clustering mechanism where the number of clusters (authors) is unknown and the algorithm seeks to simultaneously determine the optimal k number of clusters and to attribute individual works to a particular author?

* If so, what is the mechanism used for determining optimal k? That's one of the more complex and nuanced parts of unsupervised clustering algorithms, and is critical for your situation to get right.

* How does your methodology perform against texts of known authorship, particularly when an author's style may evolve over time? Does it correctly attribute early and late works of the same author to a single author?

* Relatedly, does your methodology correctly distinguish between authors of texts of known authorship when they're known to have been in close communication with each other?

* How do you construct your feature sets? Why do you select the features you did? Clustering on categorical as opposed to numeric features can be very challenging - what is your distance metric?

286susanbooks
Dic 9, 2021, 11:53 am

>285 lorax: "Relatedly, does your methodology correctly distinguish between authors of texts of known authorship when they're known to have been in close communication with each other?"

Y'mean like the Brontes, who were writing together since before they were ten?

287Petroglyph
Dic 9, 2021, 12:33 pm

>285 lorax:
Excellent questions, all!

288lorax
Dic 9, 2021, 12:37 pm

susanbooks (#286):

Bingo. But for other examples. If it can't distinguish between authors who began corresponding as adults, how could it possibly distinguish between siblings?

289LolaWalser
Dic 9, 2021, 1:55 pm

>280 faktorovich:

I don't want to belabour the point with a single example, I was merely addressing your remark that

quantitative linguistic features are unconscious usage patterns that the writer cannot deliberately control.

Queneau does deliberately vary features mentioned as "quantifiable" markers of style (not just genre or tone). And, while this book in particular is based on a short text, there are other longer, even novel-length examples of Oulipian literature produced with constraints that result in what I suppose one might call artificial style, different to those authors' unconstrained writing.

In short, I take your point that there are features we reproduce in our writing "unconsciously", but I'm not convinced that they can't be or often aren't manipulated at will.

290faktorovich
Dic 9, 2021, 2:59 pm

I have carried out a verification-check titled “Lunch Test”. I have added the above diagram and the full data table for the raw numbers and the attribution steps to my GitHub: “Lunch Test - Attribution Diagram.jpg” https://github.com/faktorovich/Attribution/blob/master/Lunch%20Test%20-%20Attrib... and “LibraryThing - Lunch Test - Data - 12-9-2021.xlsx” https://github.com/faktorovich/Attribution/blob/master/LibraryThing%20-%20Lunch%.... This test re-affirmed the earlier findings of my “Koppel Experiment Reviewed - Data Tables.xlsx”: https://github.com/faktorovich/Attribution/blob/master/Koppel%20Experiment%20Rev.... There are only two linguistic signatures between the three “Bronte” sisters, and since only one text from “Emily” could be tested, I have assigned the byline for this linguistic group to the multi-texted “Charlotte”. “Anne” and “Charlotte” clearly have distinct signatures, as do “Austen” and “Corelli”. There were a few linguistic intersections between all of these bylines that indicate occasional co-writing of some of these texts by the authors in these distinct linguistic groups. One such example is “Austen’s” “Northanger” that is similar to “Anne’s” “Agnes”. The most significant example of this cooperation is between Corelli texts and a few “Anne” and “Charlotte” texts. The data could not establish if any of these bylines are genuine or if they represent ghostwriting-purchasers/pseudonyms or the like. It only establishes the degree of similarity and difference between these texts. This test confirms the accuracy of my 27-tests computational-linguistics method because nearly all of the texts correctly show similarity within a given byline, and divergence between that byline and the other bylines in the corpus. If the results were random, there would not have been any noticeable pattern in the data. Most of the 10+ test matches are correctly located on the data table within the same bylines. Thus, when these matches show up in other bylines as well or when two bylines match each other’s texts, these byline-breaking matches accurately indicate two or more collaborating signatures, or a single signature using two or more pseudonyms.

In addition to this summary table, the full data set includes the relative word-length of these texts, the year when they were published and other data to assist future researchers. It also includes the exact numbers of exclamation marks and the other test results that contributed to these matches/non-matches. It also includes the data on the top-6 most-common letters, words and 3-word phrases in this corpus. As you can see from this information, this group of texts are unusually similar to each other, as 9 out of 16 of them share the same b-pattern of top-6 letters (e, t, o, a, n, i) – all of Austen’s 6 texts use it; and 9 of them share the A-pattern of top-6 words (and, the, to, I, of, a) – including all 5 of “Charlotte’s” texts (including the “Emily” likely-pseudonym). There are also a strange number of texts that share the same most-common 3-word phrases: 3 with “I am sure” (preferred by “Austen”), 4 with “I could not” (mostly preferred by “Charlotte”), 3 with “I do not”, and 3 with “I don’t” (the latter 2 phrases are shared by “Austen”, “Corelli”, “Emily”, “Anne”, so they are likely to belong to the main collaborator in this group). The presence of these byline-specific letter, word and phrase preferences further confirms the quantitative attributions in this experiment.

The broad conclusion of this test is that the current accepted byline-attributions and the belief in these authors’ purely independent authorship is erroneous. I found similar collaborative and ghostwriting in male bylines from these decades in the “Koppel Experiment”, so this is not a statement about female authorship as opposed to male authorship. This is simply an honest representation of professional writing in the highbrow canon we have been learning in school. If any of these female bylines were actually men working under female names to attract readers, this means that actual women might have otherwise found these publishers who were aiming to tap female-readership. If a man was paid for the writing-labor credited as the “work” of a glass-ceiling-breaking “woman”; retaining his byline on a book under the assumption that the existence of this byline empowers all women is faulty. Then again, the underlying ghostwriters could have been women, but only a testing of hundreds of bylines from this century and a full study would uncover the precise truth. This experiment simply distinguishes the linguistic signature in this small sample corpus.

291spiphany
Dic 9, 2021, 3:18 pm

So the only possible and reasonable explanation for a similarity between texts attributed to two different authors is collaboration, or else some hitherto unsuspected ghostwriter must be responsible for both?

BTW, I'm bad with dates, but a quick google search indicates that Marie Corelli was born in 1855; Charlotte Bronte died in 1855, Emily in 1848, and Anne in 1849. Jane Austen died in 1817, when the oldest of the Bronte sisters was only a year old.

Yet you think it is plausible that they collaborated with one another? Or that there was some ghostwriter of such longevity that his work spanned several generations?

Alrighty then. Carry on.

292prosfilaes
Dic 9, 2021, 3:33 pm

>290 faktorovich: One of the key characteristics of science is falsifiability. As far as I can tell, there's no set of results that could be produced by your methods that you would consider evidence that they're wrong. If your test disagrees with well known facts, then you've felt free to dismiss those facts as wrong.

293faktorovich
Dic 9, 2021, 3:33 pm

>291 spiphany: To answer your question, I organized all of the texts in the table by their publication-year. The 5 earliest-published texts match the "Austen"-signature, with only 1 "Austen" text first-published in 1871, though Austen had died back in 1817. One of the "Austen" overlaps - "Anne's" "Agnes" - was published in 1847. If you let go of the idea that either "Austen" or "Anne" had to be the authentic authors behind these bylines and consider that a ghostwriter could have started writing under the "Austen" byline in the 1810s or earlier and kept writing after Austen's death into the 1840s or later; then, there is no abnormality in this data. The collaborative match between "Corelli's" "Young Diana" (1918) and "Charlotte's" "Shirley" (1849) suggests that Corelli was definitely not the underlying ghostwriter who wrote under her byline, as Corelli was born in 1855; Corelli happened to be born under a different name that she changed to the pen-name, so this begins to re-affirm the ghostwriting conclusion. It is likely that texts were written close to the year when they were published, and in this corpus a few of the texts were first-published after their byline-author's death; therefore the claim that the byline-authors were dead or not yet alive when the first-texts re-attributed to them were written only confirms my findings.

A match between 2 different bylines can mean a few different things: 1. collaborative writing between them, 2. single ghostwriter/author writing under 2 bylines, and a few other scenarios. The number of matches within the byline and a few other elements decide what scenario is the most logical for a given couple of texts.

294Keeline
Dic 9, 2021, 4:04 pm

I deal a lot with the complex roles of ghostwriters in juvenile series books of the 20th Century. There are cases where there are multiple "authors" in a given published text and it is nearly impossible, without an edited manuscript, to learn who did what.

For example, the Stratemeyer Syndicate provided its ghostwriters with an outline of a couple to dozens of pages from which they produced a book manuscript of a couple hundred pages. When the writers transcribed content from the outline, the style should be more like the author of the outline. After a manuscript was turned in, there would be rewrites by one or more people as part of the editing process. It is easy to have three or six "authors" in one of these books.

I am prepared for the comment of "how they did things in the 20th Century is irrelevant to earlier periods." I disagree. Further, just how professionalized are we to believe that authorship was in the 1500s through the early 1800s to have ghostwriters working everywhere and just a handful of them? It just is too much like a conspiracy theory to be believable without some very solid proof, both intrinsic and extrinsic to the texts.

With something like the Brontë sisters acting as a writing community and forming a read-and-critique circle, wouldn't the suggestions by the others influence what was published.

Also, the usual notion is that female writers adopt male or gender-neutral (e.g. initials) pseudonyms to increase readership (e.g. fantasy and sci-fi genres). However, you are saying that some male ghostwriters used female pseudonyms to increase readership. Why were the earliest versions published under male names which are now seen as pseudonyms?

James

295Stevil2001
Dic 9, 2021, 4:27 pm

This thread is the gift that keeps on giving.

"If there's nothing wrong with me, there must be something wrong with the universe."

296faktorovich
Dic 9, 2021, 5:35 pm

>253 Petroglyph: As I explain in a separate post, I have performed this experiment using my 27-tests method. I used the same plain-text files you posted the link to. I am now going to explain why your findings are faulty in comparison with my own. The first problem I have noticed is that your diagram uses "Classic Delta distance" as a measure, but you do not define what this is a measure of precisely. This is what I found about this concept on a brief search: https://rdrr.io/cran/stylo/man/perform.delta.html - Stylo does not specify which specific "measurements" are included in this calculation, so it can be anything. What does the spectrum between 2 and .5 mean; these numbers do not make any sense without non-matches for reference.

My data also contradicts that "Susan" is significantly less like "Emma" and still further less like the other 4 "Austen" novels. "Susan" has 7 matches to both "Emma" and "Northanger", and "Susan" is most like "Mansfield" that this diagram shows instead to be furthest from it. Both "Susan" and "Mansfield" share the b-letter-pattern (e, t, a, o, n, i). The first-person in "Susan" is reflected in the top-6 words measure because "I" (as well as "her") is present among its top-words, whereas "Mansfield" does not have "I" among the top-six (and only has "her"). Because the method described by Petroglyph uses word-frequency as its main linguistic test, his attribution conclusion regarding "Susan" is heavily influenced just by this first vs. third person difference. Because word-frequency is only 1 out of 27 tests that I use, this single measure did not influence the overall attribution of "Susan" as similar to "Mansfield" in my results.

However, it would be an error to subtract most-common words from consideration all-together because both "Susan" and "Mansfield" still shared the "her" pronoun that this author clearly favors. In most texts I have found very useful in revealing patterns among the top-6 words, and they more frequently confirm the overall attribution than contradict it. With this test in the mix, researchers can see what influence voice-change had on the linguistic results.

Petroglyph's "Cluster Analysis" visual of all of the texts compared to each other also re-affirms my finding regarding "Emily's" style being similar to "Charlotte's"; his visual just happens to place it on a slightly divergent bend, and thus Petroglyph sees this as sufficient to say that "Emily" and "Charlotte" form two distinct signatures, and not variations within a single authorial-signature.

The biggest problem to Petroglyph's entire "Lunch" experiment is that he did not even include a link to the raw numeric data, and just shared these visuals. He is not sharing exactly what features were measured; he needs to list these, so I can double-check why the elements he measured yielded these results. In contrast, all of my data-set is available, and anybody replicating the steps of my method can check the accuracy of my process, and trace-back the reasons for the summary results.

"Diana" is indeed a bit different from the other 2 "Corelli" texts but there are many other significant elements to consider aside for the first/third-person voice, including "Diana's" publication decades after the other two. "Diana" has a different letters and words pattern from these other two novels; but it also shares phrases such as "one of the" with them. "Diana" actually has the "she" pronoun among its top-6 words, whereas the other two include "I" in the top-6; this makes me question if Petroglyph might have confused the first and the third-person voices between these novels. On all 27-tests combined "Diana" is not significantly different from the other two.

My findings also contradict that "Anne's" novels are similar to "Susan", as they are a non-match at only 5-7 tests in common. For example, these three texts have different top word patterns, as one favors "I" another "her" and another "you". Without the raw data, I cannot research what this testing method could have found to be similar between them.

And Petroglyph has repeatedly said that he removes the most-common words from consideration to avoid the bias that comes with first/third-person/ generic variation, but in contrast to these claims he is saying that these pronouns have had an attribution-changing contribution to these findings. This would have been the case if he just tested for the most frequent words in these texts (only 1 of my 27 tests).

My test took me a day instead of just lunch-time, but the data it provides proves the case with overwhelming evidence; whereas Petroglyph's method just plots relative similarity points or a single data-output conclusion. He is not even showing all of the text-to-text comparisons or the degree of similarity between them, and instead is showing their relative linguistic values. If readers trust these conclusions, these are nice visuals; but anybody who doubts these conclusions cannot check them to see what might have led to misattributions. The 35% and 16% figures he gives are entirely fictitious and there is no explanation what they can be based on; they just appear to be used to make the whole thing seem more detailed than it is.

Overall, Petroglyph is reporting similar cross-byline similarities that I am with my method; but he is denying that these similarities indicate the presence of pseudonyms, collaborative writing, ghostwriting etc. What this data is saying is not debatable. If there are cross-byline similarities; it cannot be true that all of the authors are only writing under their own "true" bylines.

297faktorovich
Dic 9, 2021, 6:01 pm

>276 Keeline: These various replies regarding UTF-8 working in Stylo with specific adjustments all really mean: there are an enormous amount of many fine-point-problems that have to be known to perform the basic step of accessing a folder with the files to test. There are not only no manuals that address these possible glitches, but even specialists do not know which of these potential problems can be plaguing a given user. All this just means that Stylo is not usable by the public, either intentionally or because the programmers did not have time to write up a coherent manual.

298faktorovich
Dic 9, 2021, 6:20 pm

>278 melannen: The texts I used were UTF-8. Are you saying Word cannot create a UTF-8 file, even if it has an option for this type of plain-text file creation. Your responses are nonsensical. A set of simple steps for any method means a set of steps the user can follow to perform the test. If you think everybody should know advanced programming before they can use your simple steps; then, they are steps meant for advanced programmers, and not for the general public. You have been asserting that your method is accessible to the public, so you have to choose if your method is too complicating, or too simple for either of these arguments to make sense. I did not download texts from any website in xhtml/sgml; I have no idea why I would have done this absurd error. As I explained before, I tested creating a txt file in Notepad, and it was also not recognized as an acceptable file-type by Stylo. Stylo did not recognize other types of plain-text either when I tested it.

299faktorovich
Dic 9, 2021, 6:24 pm

>284 prosfilaes: There is no bias in my re-attribution of the British Renaissance to 6-ghostwriters. I am simply reporting what the data says. You can look at the exact data on my GitHub to check this conclusion. Nobody has ever compared 284 X 284 texts against each other in this Renaissance corpus, so there are no rival arguments about this group of texts as a whole. All previous studies have been about small(er) groups of texts that only tested single elements such as word-frequency and thus they have not tapped into the underlying linguistic patterns I describe in my book.

300faktorovich
Dic 9, 2021, 6:34 pm

>285 lorax: You have not read the interview with me this discussing string is about. I explain the steps involved in my process there. They are steps that anybody can do with their home computer and free software; they do not involve any coding. Yes, I cluster authors together, but based on the degree of similarity or divergence they have with other texts. The main difference with my method is that I do not assume any bylines are accurate, and thus do not start with byline clusters, but rather check which texts are similar to each other and let their similarity determine they belong in a single cluster. The point of my findings is that the concept of "known authorship" is faulty. The byline on a text might or might not reflect the true underlying author behind that text. For example, more than half of the texts in the Renaissance were originally anonymous, but most of these now have bylines that were assigned to them by scholars across the past 400 years; these assigned bylines are considered as "known authorship". Meanwhile, some texts that initially had "Shakespeare's" byline have been de-attributed to other bylines by scholars; and these other bylines are now "known authorship". Testing if an attribution method re-affirms "known authorship" is like testing for a murderer by considering all outcomes that lead to anybody other than Suspect X as faulty methodology. The features I use and the data related to all of them is included in the spreadsheets I have made available for public viewing/ testing on GitHub. I measure divergence or "distance" by the number out of 27 tests two texts match on, so this is a very complex and thus accurate 27-level metric.

301faktorovich
Dic 9, 2021, 6:38 pm

>292 prosfilaes: "Falsifiability" is not a "key characteristic" of all science. The gravity equation g = GM/r2 might be something that cannot be proven to be wrong; this does not mean that gravity does not exist, or that this equation must be unscientific.

302Petroglyph
Dic 9, 2021, 6:39 pm

>290 faktorovich:
Alright, let's take this at face value for a second.

What is the cutoff for similarity? The 10+ that you mention is clearly supplemented by 7 through 9 as well. In your spreadsheet there's several values of 7 (several in the Jane Austen block, CBronte block and Corelli block) that are judged as "similar". But a score of 7 between Tenant and Lady Susan, between Tenant and Mansfield isn't. There are multiple instances of 8 and 9 that aren't highlighted as co-writing similarities (such as between CBronte and ABronte). What is the rationale behind allowing some of these to "count" but not others?
A single individual behind both Charlotte Bronte and Marie Corelli? This needs no comment from me.
Posthumous publication of an unfinished book by a world-famous author? Naaaah! Impossible! Co-authorship!
In this "forthcoming article" of yours, did you already note your "similarities" between CBronte and Corelli? Were you aware of them? Will you incorporate this new information into your article?
If an author like Corelli uses language and speech patterns that were old-fashioned and even archaic for her day, would that not account for similarities between her texts and those from thirty, forty years before? Why posit co-authorship?
How do you distinguish between actual similarities and coincidences? Books authored by the same individual vs different people with similar styles. If you keep looking for similar author signatures, you are bound to find similarities that are due to chance and not shared authorship. How do you distinguish the two?
Logically, if you have 27 tests, each of whose results you manually alter to 1 or 0, you have a total of 729 potential author signatures that your method can recognize as similar from each other (27^2). There are only 729 combinations of one-or-zero-scores that you can add up to the number you put in this table (7, 10, 26). Well, you add a test, so the max is 28. Even so, that means 28^2 = 784 logically distinct possibilities. So. There must be a number of authors whose patterns match up but who could not possibly have produced the same body of texts (because they were born 150 years apart, or sth).
I've already explained in >267 Petroglyph: why the "top whatever words" are no corroboration of anything whatsoever. So that takes 6 tests out of your 27. The same thing goes for the "top 6 letters" (It is not unreasonable to consider the possibility that a novel about the Dashwoods may show different letter counts than a novel about the Bennetts. You need a way of preventing such incidentals from measurably impacting your diagnostic features, and character names and locations may impact those raw frequencies. Also, Have you removed all "chapter one" etc from your texts? Those will impact counts, too!). So there go another 6. That leaves really only 15 tests. 15^2 = 225.

303Petroglyph
Dic 9, 2021, 6:41 pm

>301 faktorovich:
Well, after Einstein and Galileo, we're on to Newton.

304faktorovich
Dic 9, 2021, 6:47 pm

>294 Keeline: There have been known writing Workshops that used pseudonyms or a communal name from ancient Judaic literature, to Renaissance Italy, to the 20th century as you point out. There is nothing unique about the 20th century or the Renaissance that would make the case that ghostwriters were working during these times any more or less believable. The central distinction about the British Renaissance might be the small number of these ghostwriters (6), but this number makes sense when vagabond/ fraud/ monopoly laws I explain are taken into account. They could not really advertise or even privately solicit for new ghostwriters to join the group without facing fraud etc. charges. Imagine you were a ghostwriter/ publisher who worked with ghostwriters: you might slap semi-random pseudonyms (male or female) on minor publications that you don't think will be very successful; but if a marketing ploy of using female names comes up later on that gives a set of books an advantage, you would leap bylines. Examining existing records with these possibilities in mind is likely to lead to very different conclusions, than if a researcher accepts the "Bronte"-bylined handwritten manuscripts as unquestionably "known" as their own, or without considering the possibility they can be forged.

305faktorovich
Dic 9, 2021, 7:11 pm

>302 Petroglyph: The gray boxes I entered into the table are designed to assist the viewer with seeing the overall similarity between the texts in the cluster. There are degrees in their similarity: a higher similarity means sole-authorship, whereas a lower score can mean co-authorship or that it is indeed large the work of another author. 7-9 is the gray-area; in some cases two texts might have a 7 when X is compared to Y, whereas when the order is reversed they might have 9 tests in common; thus, it would be unfair to disqualify all 7 matches without considering if a given text matches other texts in the byline being compared against etc. You are fixating on these gray areas, when most of the table shows far less than 7-matches in non-matches or in the parts of the table that are not highlighted in any color.

I did not say there was one author behind "Charlotte" and "Corelli" - I said there was a collaborative author who helped both of these bylines separate authors - this collaboration is the reason these bylines register as largely different from each other, but with some overlaps between them that show collaboration, which could have been one of them helping the other one, or a third collaborator intervening to help both of them.

Yes, in my study I have found that most posthumous books are ghostwritten or share linguistic styles with otherwise-bylined texts that were written after the bylined-author's death.

I did not analyze Corelli in my forthcoming Journal of Information Ethics article; I did an entirely new study just for you. The findings on Corelli do not add anything new to my analysis, as I discuss 21 texts in that article and there are several other mysteries that I solve that are more interesting than the Corelli case.

"Old-fashioned" language? Just like comedians can mimic accents and "old-fashioned" language, so can most professional writers.

The more I apply computational-linguistic tests to texts and research these authorship mysteries, the more convinced I am that there are no coincidental matches; all matches indicate shared authorship. You can call this similarity editorial-input or the like, but if most of the text is the original work of this "editor", he or she is the true underlying ghostwriter.

The 729+ possible combinations you refer to is the reason my method can distinguish between the degrees of similarity. Obviously, my method is not equivalent to the precision of handwriting-analysis or DNA, as it is far more likely that two humans in the world have similar writing styles than 2 have identical DNA. However, all humans on this planet do not write books professionally. To make a living from writing, the truth is a ghostwriter has to write constantly. These top-ghostwriters/writers have the stylistic patterns that are of interest to researchers. According to my findings, most ghostwriters, publish at least one text with their own byline, which makes it possible to identify them as the underlying author. Renaissance documents have been digitized and made publicly available in a way that makes it possible for a researcher to check documentary proof vs lack of it for the existence of a byline to separate pseudonyms from real people etc. I have already considered nearly all of the bylines on all published texts from the century of the Renaissance before reaching my conclusion about the 6-ghostwriters and naming them. This is why I have proven the Renaissance case in my series beyond reasonable doubt, whereas any case I make about the Brontes in these mini-experiments are surface glimpses into their authorship attributions.

The top-6 letters are only 1 and not 6 of my tests, as is the case for the top-6 words: these are just 2 out of my 27 tests. If you actually attempted to follow the steps of my method to see how it works, you would not make this error.

306Petroglyph
Modificato: Dic 9, 2021, 8:42 pm

>296 faktorovich:

Here we go.

The first problem I have noticed is that your diagram uses "Classic Delta distance" as a measure, but you do not define what this is a measure of precisely.
Burrows' Delta. Hover over the various options in Stylo. It'll tell you what they measure. (If you want to know more, check this overview paper: https://doi.org/10.1093/llc/fqx023)

Or you can take it from the horse's mouth: Burrows, J. F. (2007). All the way through: testing for authorship in different frequency strata. "Literary and Linguistic Computing", 22(1): 27-48.

You need to look this up?? It's a bog-standard method!

"Stylo does not specify which specific "measurements" are included in this calculation"
False.
I will just repeat what I said in >251 Petroglyph: Stylo "will also dump into your working directory a word list of all the tokens in your corpus; a table_with_frequencies.txt, which gives the frequencies for all the word tokens in your corpus per text; and a stylo_config.txt file, which lists the configurations: the dimensions of your graph, which analysis was performed, the distance measure, and much else besides. If you want to recreate a graph / analysis: here are all the parameters you need."

"The biggest problem to Petroglyph's entire "Lunch" experiment is that he did not even include a link to the raw numeric data, and just shared these visuals."
Rofl. Moving the goalposts, much? I told you exactly what settings to choose and provided you with the exact corpus.

Either way: Here is my config file for the Austen cluster. Here is the table_with_frequencies. Any spreadsheet software will be able to read those.

""Diana" actually has the "she" pronoun among its top-6 words, whereas the other two include "I" in the top-6;"
If you haven't understood yet why it is a mistake to take absolute frequencies as diagnostic features, then I don't know what to tell you.

"He is not even showing all of the text-to-text comparisons"
False! Those graphs are the result of all text-to-text comparisons. Check that table_with_frequencies! See those lines on the cluster graphs? They show the distance you'd have to cover to get from one text to the other. The longer the lines, the greater the distance / difference. Those lines are the text-to-text comparisons. Or rather, the result of each-text-to-each-text comparison.

How do you not know this?

"anybody who doubts these conclusions cannot check them to see what might have led to misattributions"
You keep repeating this. Saying it multiple times does not make it more true. I told you what software to use, under what settings, and gave you a corpus.

But alright. This is what Burrows' Delta calculates. :

Feel free to double-check by hand. Or use Excel.

307Petroglyph
Dic 9, 2021, 8:06 pm

>305 faktorovich:
You are fixating on these gray areas
Because a consistent application of a methodology is at least as important as having a methodology. The decision to "count" 7 similarities or not shouldn't be up to the individual case (the researcher, in their deliberations, might be biased towards the option that would be most interesting). Either count them, or not.

You're sacrificing systematic application with a clear cut-off point in favour of spur-of-the-moment decisions that might go the other way on a different day.

You're always going to "miss out" on interesting cases, either by not counting interesting 7's, or by making subjective judgments (and introducing errors). Opt for rigour and systematicity every time.

"it would be unfair to disqualify all 7 matches without considering if a given text matches other texts"
Fairness has nothing to do with it. Introducing judgments based on what would be fair makes the method subjective and open to bias. Count all sevens, or count none. Sevens could be like your "supporting top 3 phrases" -- not part of the actual analysis but interesting if other features stand out.

Apply some conditional formatting to your table: anything 10+ gets an "important" colour. Saves you from doing things by hand.

""Old-fashioned" language? Just like comedians can mimic accents and "old-fashioned" language, so can most professional writers."

...So Corelli could not have affected archaic language, but her ghost-writer could?

Answer the question: If you find affinities between Corelli's texts and those published decades earlier, is it not possible that her archaic language causes these similarities, and not some unknown ghost-writer's archaic language? If the "professional writer" can affect archaic language, so can Corelli.

"The more I apply computational-linguistic tests to texts and research these authorship mysteries, the more convinced I am that there are no coincidental matches"
I can only respond with a shrug emoji: 🤷🏻

"The 729+ possible combinations you refer to is the reason my method can distinguish between the degrees of similarity."
Sure, let's go with that. So many possible combinations are left open after you replace "percentage of passive voice" and "average sentence length" and "average word length" with a binary opposition 1/0. Let's be generous and double it: I'll grant you 1500 different possibilities.

308Petroglyph
Dic 9, 2021, 8:24 pm

Faktorovich has uttered insinuations and accusations of underhanded and sneaky self-serving sampling techniques in >257 faktorovich: "This is especially the case if a researcher selects chunks just as politicians in the US gerrymander congressional districts so that the final result is more elected officials that favor the gerrymandering-officials preferences vs. the average political preferences of the entire population."

Either Faktorovich genuinely does not know how sampling is done, or she is making maliciously untrue statements.

In reality, sampling is done automatically, by the software, precisely to avoid bias like this. You tell the software how many random samples to take, whether they can overlap, what the size of the sample is, etc. And the software just does it. You don't tell the software what to choose; for a 500-word sample it runs a random number generator, starts at the word of that number, and grabs the next 499 words as well. Boom, random sample.

I'll even give you a step-by-step tutorial! Call it a pre-amble to a Lunch Break Experiment (tm).

Assuming you've downloaded R and Rstudio.

Open RStudio
In the console (a ">" followed by a blinking cursor) type this or, rather, copy/paste it because it has to be exact: install.packages("stylo") and press enter. Wait for the process to finish.
Next, type or copy this: library(stylo) and hit enter. This will activate that package for the current session.
Type or paste this: CBronte_jane = readLines("https://www.gutenberg.org/files/1260/1260-0.txt")
What this does is grab the text file at that hyperlink and save it in R under the name CBronte_jane. It's Jane Eyre.
Type or paste this: my_first_sample = make.samples(CBronte_jane, sample.size = 1000, sampling = "random.sampling", sample.overlap = 0, number.of.samples = 1, sampling.with.replacement = FALSE)
What this does is take the text known as CBronte_jane, extract a sample of 1000 words, saving it under the name my_first_sample, using random sampling. The samples do not overlap, the number of samples taken is one.
type or paste this: my_first_sample and hit enter
admire your random sample (R will print only the first few lines b/c 1000 words is a lot. If you want, you can have R create a text file with that sample in it.)
Repeat steps 5 and 6 (either by recopying/repasting, or by pressing the up arrow until you see the command you want to run again). Every time, the sample taken will be different. That's what a random sample means!

These samples will start in the middle of sentences and the middle of paragraphs. Because they are random!

For multiple samples, just adjust the parameters in step 5 above:

Type or paste this: samples_10 = make.samples(CBronte_jane, sample.size = 1000, sampling = "random.sampling", sample.overlap = 0, number.of.samples = 10)
This takes ten 1000-word samples from Jane Eyre and saves them under the name "samples_10"
Type or paste this: samples_10
Admire your 10 random samples
Repeat steps 1 & 2. Every time there will be a different set of random 1000-word samples
In the next step, you can have R write these samples to a txt file.

By adjusting the parameters of the sampling formula, you can tell Stylo to take 100 samples of 500 words, or tell it that the samples may overlap (so that, say, the last 57 words of one sample may be the first 57 of another sample). Whichever is most appropriate for the set of texts you have in front of you.

Of course, this is not limited to just a single text. You can import whole corpora into R (say, the collected works of Dickens) and run samples on all of those texts simultaneously. Not just samples, too: you can run word frequencies, emotional words, function words, ... on a whole bunch of texts simultaneously, and have the results printed in a nice table for further analysis. Just like that!

309Petroglyph
Modificato: Dic 9, 2021, 8:54 pm

I felt like doing another Lunchtime Experiment. It's really an after-dinner experiment. But they're both meals, so similar enough to count as the same thing.

This time round, I'll look at some twenty-odd novels from the Wizard of Oz series.

I cribbed this idea from section 3.3.1 of this paper: Gladwin, Alexander A. G., Matthew J. Lavin, and Daniel M. Look. 2017. ‘Stylometry and Collaborative Authorship: Eddy, Lovecraft, and “The Loved Dead”’. Digital Scholarship in the Humanities 32 (1): 123–40. https://doi.org/10.1093/llc/fqv026.

A pdf link for those blocked by a paywall.

Gladwin et al.'s account really is taken from the following paper: Binongo, José Nilo G. 2003. ‘Who Wrote the 15th Book of Oz? An Application of Multivariate Analysis to Authorship Attribution’. CHANCE 16 (2): 9–17. https://doi.org/10.1080/09332480.2003.10554843.

So. Let's look at The Wizard of Oz series. Its original creator L. Frank Baum wrote some 14 books in the Wizard of Oz series (plus some story collections, and plays. But those don't concern us here). After his death, several people continued writing instalments for the series, most notably Ruth Plumly Thompson, who penned like 20 or so. Wikpedia says that her style was very different, more fairy-tale like.

One of the books, The Royal Book of Oz, likely authored by Thompson, was originally published as "by L. Frank Baum", but was later credited to her.

So. We have two authors with, apparently, recognizably different styles. We have one book that has been attributed to either. Let's do a rapid and basic R:Stylo run. Two questions: Will the software correctly separate all the Baum stories from the Thompson ones? And which author set will Royal be closest to? Spoiler: Binongo found that it's probably almost entirely by Thompson. But Binongo (and Gladwin et al.) do some fretting about advanced methodologies, so I, noob that I am, will just work with the defaults on Stylo and see where that takes me.

I've downloaded 11 of Baum's books off ProjGut -- I just selected the 11 most popular ones. No real triage, because this is a Lunch Break Experiment (tm). I also downloaded 9 of the Thompson books. The Royal Book of Oz I've named "Mystery_royal". Just to make things interesting, I've also dumped Pride and Prejudice in the corpus folder, along with Jane Eyre and Corelli's The Young Diana, along with Charles Stross' Accelerando, a 2005 collection of Science Fiction short stories released under Creative Commonsand downloadable for free. Click here to download this corpus as a zip file if you want to recreate / double-check my results.

I've removed all introductions, ProjGut legalese, title pages, letters to my readers, summaries of other books in the series, etc. All books start at Chapter 1.

Here is a cluster graph, run with Stylo default: a cluster graph (on the tab Statistics) and with the pronouns removed from consideration (under the tab Features).

Baum's books are in green, and they're correctly separated from all the Thompson books in grey. The Royal Book of Oz is lumped in with Thompson. So it looks like this rough and ready Stylo analysis (using Burrows' Delta) is in line with the more advanced techniques used by Binongo.

The three extra novels that are litfic/romance novels for adults are clustered together, and separate from the simple-language feel-good stories for children. As expected. The software thinks they are more like Baum's works than Thompson's. The software also thinks that Stross' science fiction is much closer to Thompson's works. Make of that what you will.

Conclusion: You really can compare any set of texts, and the tools employed will happily produce distance measures and cluster graphs and whatnot for whatever group of texts you ask it to cluster. Whether it is meaningful to do so or not.

Using the same settings, but with the noise removed:

Yeah: Thompson wrote Royal. (I also ran these tests with the pronouns left in and got pretty much the same results.)

What does this neat separation between the two authors look like on a scatter plot? (Settings: PCA (corr.) under the tab Statistics; pronouns excluded under the tab Features.)

I have nothing to add here, really. This graph is not substantially different from the previous one: they both take the same data, the same calculations as input; they're just different ways of visualizing these data.

Time to kick this up a notch! Let's try some sampling.

Instead of letting Stylo run Burrows' Delta against the entire text at once, let's extract 5 samples of 5000 words each. That produces the plot below (same settings as the last plot, but on the Sampling tab, choose random sampling, 5000 words, 5 samples, no overlapping.)

The actual text labels have become illegible, but the colours are enough: None of Baum's 5000-word samples are close enough to Thompson's (the actual dots on the plot are at the midpoint of the label name.). None of Thompson's samples have made it over to Baum's side either. all of Royal's samples are comfortably within the Thompson cloud.

Let's do the same thing, but for 10 1000-word samples per text:

Across these latter two graphs, the clouds of samples spread out a little. This is natural: some samples will contain more dialogue, or more description, or perhaps even poems and songs etc. But the take-home message is: The differences between any of the Thompson samples are smaller than the differences between any of the Thompson samples and all of the Baum samples.

It's nice to have confirmation that a fairly standard measure like Burrows' Delta gives a good enough result even without sampling. This is probably why it is a standard measure in the first place.

310aspirit
Dic 9, 2021, 8:51 pm

>310 aspirit: Nice! Your Dessert Experiment (;-D) is really fun and makes me want to play with R and Stylo. Maybe this weekend.

311faktorovich
Dic 9, 2021, 9:00 pm

>306 Petroglyph: The paper you cited (https://doi.org/10.1093/llc/fqx023) does not include a list of features that is included in the Delta method, though it does mention various potential features a method could test. This lack of clarify is obviously intentional or to make the method seem far more complex with the mathematic formulas used in the description than the simple test for common-words that is obviously applied in it. This is confirmed by your statement "a word list of all the tokens in your corpus" - only word-lists are generated in the data because only word-frequency is tested.

Yes, I have seen spreadsheets before like your "table_with_frequencies". And no, "any spreadsheet software" cannot open this file, as Excel cannot open it; it only opens as a simple-text file where the rows are jumbled and difficult to interpret. This spreadsheet includes the names of the different words in the corpus and the statistical frequency of their relative appearance in different texts. The frequency of all possible words obviously creates a nonsensical data-set. What is missing in this data is how your system goes from this enormous data-set to the final seemingly simple data points on a graph - where does all this complexity go? What is the formula for selecting which of these similarities are more significant than others etc.?

You are just ignoring my points and again repeating that I just don't understand you by misinterpreting what you imagine I stated.

The graphs can be the results of fiction-writing. It is the researcher's and the software-developer's job to explain how raw data points are altered to simplify them into just a few data-points. Without this underlying specific formula of conversion, the researcher would be as well served by performing a magic trick without any science behind the transformation. There are no cluster "graphs" in any spreadsheet - it is a data-table, not a graph. There are no lines in your table; you are confusing the lines in your diagram with the data in the table. It is clear to all what your diagram is supposed to represent (the lines of similarity between texts); I am objecting that there is no rational explanation for how you arrived in this clustering-conclusion.

The formula that you do list is absolutely absurd for the following reasons. 1. By the average z-score you mean the frequency of a given word in a text. There are thousands of unique words in most texts, so how can all this data be simplified into a single average? Are you adding up all frequencies for all words and calculating the average frequency between them? This would just be nonsensical for purposes of attribution. There is no weight given to any words in this formula. To clarify your formula: k = text; author = a; n = words; z = frequency score. You are saying that by manipulating the name of the text, the author, the number of words in the text and the frequency score for all words on average, you get a single point of similarity-divergence on a graph of this texts similarity. The aren't even an author 2 or text 2 in this formula. This is the only formula this system uses to derive your attribution answers? Can you try giving an example of how this formula was applied for a specific comparison? It just seems to be nonsensical double-speak.

312faktorovich
Dic 9, 2021, 9:16 pm

>307 Petroglyph: I am neither always discounting nor always counting gray-area matches, but rather using the data in the full corpus to interpret if each of these cases is indeed interesting or relevant. In the table I posted, the gray areas did not represent only cases of matches, but rather the sections of the text-to-text comparisons that on-average matched all of the texts in a given byline. You are concluding the 7-9s are uncertainties, whereas I did not include a write up in my explanation that explains them as either matches or non-matches. I do explain these variations in degrees of similarity in detail in my Re-Attribution book, as well as all of the other points we have discussed in this chat. The color-coding system is designed to guide readers of the table, so they can reach their own attribution conclusions - they can choose to count 7-9 as matches or non-matches. If I had written an article about this data, I would have dived into these questions. And all readers of this discussion can take this data and write articles of their own that disagree regarding the 7-9 matches. The point is that most of the data consistently fits into the gray regions of in-byline matches, or can be explained in the intersections/byline-crosses, and the much larger non-gray non-matches portions. You just asked me to test if my method works on the same set of texts you tested; and this data set proves that it does, as our findings matched in many of the cases; so to say that my conclusions is wrong, you would have to find fault with your own as well.

To evaluate the significance or lack thereof in archaic language that appears in Corelli's texts I would have to write several articles, starting by picking out most of the obvious examples of archaic language and searching for similar usage across British literature. Then, I would test all of the texts where similar language appears against Corelli's texts to check if there might be a shared ghostwriter between them. It is likely I would have to test several hundred texts and would have to explain the nature of archaic language in question etc. to arrive at a certain answer. But simply-put, both a professional writer and Corelli could affect archaic language; but as I mentioned previously Corelli is very likely to be a pseudonym and not the name of an authentic author given that some texts with this linguistic signature were published before she was born.

313faktorovich
Dic 9, 2021, 9:41 pm

>308 Petroglyph: I followed the steps you gave for generating random samples. I see 10 first lines, some of which have garbled letters, some are just blanks without the words I would need to find these points in the text to double-check the chunks of text this system has selected. Only 1 out of the 10 starts at the beginning of a sentence, whereas the others all start in the middle of a sentence, and one includes only one word, which would also be difficult to locate in the text for double-checking. These glitches are serious problems because if the system cannot distinguish where sentences/paragraphs start and it is supposed to; then it cannot be trusted to calculate the rest of the attribution-calculations it is designed to perform. When I tried your second experiment for multiple samples, the errors of the start of samples not starting at the beginning of new sentences etc. were multiplied. But my problem is not with these types of glitches that RStudio introduces, but rather with the idea of choosing "random" samples that change in each calculation, or just choosing any samples or chunks out of a text. As I explained, the larger a text-size is the more accurate are the attribution findings derived by testing it; so it is counter-productive or an error to break any text into its pieces unless a researcher specifically needs to solve a mystery regarding who wrote the different parts of a text suspected of having 2 or more authors. Have you noticed my objections to chunking? Given my stated preferences, it would have been more logical if you gave the instructions to test the various other features you mentioned like "emotional, function words". Are these steps too numerous to list?

314Petroglyph
Dic 9, 2021, 9:50 pm

>311 faktorovich:
"does not include a list of features"

That's the overview article, lol. Check Burrows' own work, he explains it in detail. Don't blame me when you look in the wrong place.

This one, perhaps?. HMU if you need a pdf.

Or, you know,

This is what Burrows' Delta calculates. I can't make it more explicit than that.

"And no, "any spreadsheet software" cannot open this file, as Excel cannot open it; it only opens as a simple-text file where the rows are jumbled and difficult to interpret."

Holy shitballs, lady. Really?

Open Excel. File > Open > navigate to the folder with the txt file > double-click the txt file. (you may set an option at the bottom of the "Open" window to "all files" instead of just excel files.)

Or: right-click the .txt file. Select "Open with...". Choose Excel from a list of programmes.

Or: open a new Excel document. Drag the .txt file into the Excel window.

" What is missing in this data is how your system goes from this enormous data-set to the final seemingly simple data points on a graph - where does all this complexity go? What is the formula for selecting which of these similarities are more significant than others etc.? "

And I have answered this question several times now. I refer you again to

But if "opening a .txt file with a spreadsheet app" is beyond you, ...

You not understanding the output of a frequency table does not make the figures in that table wrong.

You keep moving the goalpostss on this: "Show me that Stylo works and how it works". I do. "Petroglyph did not show his raw data", and when I do, there's complaints that you find them hard to read. How is that my fault?

"Without this underlying specific formula of conversion, the researcher would be as well served by performing a magic trick without any science behind the transformation."

Look, if you're starting from the position that the kinds of statistics that are clearly far outside your comfort zone are inherently suspect, then there is no point in me trying to explain what they mean. You wouldn't trust me anyway -- you already don't (see your comments about elitist and gerrymandering and sneaky secretive academic cabals).

" There are thousands of unique words in most texts, so how can all this data be simplified into a single average? "
Ok, that explanation would actually require me to think pedagogically and to explain some statistics to you. That is not my job. It's not my responsibility to make sure you understand a widely-accepted and commonly used standard method in the field.

Also, you, whose method consists of manually replacing proportional measures by either/or 1/0 are wondering how a great deal of variety can be collapsed into a few general variables? Wow.

"The formula that you do list is absolutely absurd for the following reasons."
Take it up with Burrows, then. You'd better inform Maciej Eder, too (one of the main people behind Stylo).

"Can you try giving an example of how this formula was applied for a specific comparison? It just seems to be nonsensical double-speak"
Way, way, way ahead of you. I've already linked you to Gladwin et al. in >231 Petroglyph: and >309 Petroglyph:.

315faktorovich
Dic 9, 2021, 9:54 pm

>309 Petroglyph: I have already demonstrated that my method works better than this standard method on a random set of texts of your choosing. There is no rational reason for me to spend the next 2 days applying my full range of tests to this particular research problem in which I have no scholarly interest. You have demonstrated in this explanation that: 1. removing pronouns confuses most of your attributions (this is why obviously the most-frequent words that tend to include pronouns should not be removed, and yet you have argued for their removal). 2. You have shown that 5,000/1,000 word samples jumble your results, but you have not stated what word-size the chunks are in your original graph. And if there are 2 different bylines in the central texts that chronologically make sense with the first author having died; there is no mystery in comparing them. It is possible your method came up with an erroneous attribution; if my method finds errors in your conclusions; you would say that my method is incorrect, instead of admitting that you have indeed made errors even when faced with detailed data in my spreadsheet that explains the error, as I have done in the "Lunch" experiment.

316Petroglyph
Dic 9, 2021, 10:06 pm

>313 faktorovich:

"I see 10 first lines, some of which have garbled letters,"
Yes. I told you in >308 Petroglyph: "R will print only the first few lines b/c 1000 words is a lot. If you want, you can have R create a text file with that sample in it." If you were to print all the contents of your samples, you'd lose easy and/or visual access to the rest of your code.

"some of which have garbled letters, some are just blanks without the words"
Gee. Would you agree that it is important to properly format texts before you run stats on them?

Have you made sure that character encoding was not an issue in your analyses? You do move a lot of text between various environments (txt, various webpages, html,...). But, considering we had to explain the difference between filetype and character encoding to you in this very thread, and considering that you've been doing this for how long now?

Also, why is this suddenly a deal-breaker when you don't even know if

"As I explained, the larger a text-size is the more accurate are the attribution findings derived by testing it; so it is counter-productive or an error to break any text into its pieces unless a researcher specifically needs to solve a mystery regarding who wrote the different parts of a text suspected of having 2 or more authors. "

This has been explained before: Measuring on a single long text gives you one measure. Slicing the text into smaller sections that are still of usable size (so, 3k - 10k words) gives you multiple measures for a single text. It makes your descriptions more precise!

Texts have a lot of internal variety! (To take a stupid example: Lord of the Rings. Long descriptions. Plenty of songs and poetry. Chapters that are mainly dialogue.) It's about leveraging all the evidence that you have at your disposal.

"so it is counter-productive or an error to break any text into its pieces unless a researcher specifically needs to solve a mystery regarding who wrote the different parts of a text suspected of having 2 or more authors.

What do you mean, "unless"? Texts are sliced into samples for data reasons. Authorship is only one task that corpus studies and computer linguistics can do. If a text is not by multiple authors, we've just got a more precise idea of one author's language patterns.

"Have you noticed my objections to chunking?"
I have noted them. Your objections to such a trivial operation performed on corpora of multi-million, multi-tens-of-millions of words (and sometimes individual texts) mark you as an outsider, as someone who hasn't really worked with lots of data before.

Your lack of familiarity with data hygiene (encoding, using .docx and .xlsx), as well as your not knowing how to pull a .txt file into a spreadsheet does that, too.

317faktorovich
Dic 9, 2021, 10:08 pm

>314 Petroglyph: I looked at the article you cited, and features were not listed in it. The new article you linked to has a pay-wall.

For the second time, the directions you gave actually worked. I am surprised. Yes, the problem was that I had not selected "all-files". However, opening the file in Excel in a format that was not entirely jumbled did not help to convince me that this data set makes any rational sense for attribution purposes. There are 5000 different words with their frequency rates in the texts where they appear. The table is only for the "Austen" texts, and not for all of the texts. The table would have been a lot more cluttered if these rates were given for all 16 texts. Since you asked me to test 16 texts; it would be fair for you to share your full data for the full set. It remains unclear how these wildly spread out individual frequencies can be combined together in any rational mathematical way to create an attribution answer; it certainly cannot be done with the main formula you have included.

I did not mention any cabals. Gerrymandering is a very real problem that is accepted as real by academia and media alike. If you are saying that gerrymandering is equal to cabals... Then, you protest-too-much.

You are welcome to inform Burrows, Maciej etc., and they are welcome to join this discussion and explain it all to us. That would be exactly what we need to figure this out.

318faktorovich
Dic 9, 2021, 10:21 pm

>316 Petroglyph: I followed your exact steps in Stylo and this generated the errors. So if there were additional steps for cleaning up the data; why would you omit those. And if once I have created an edited file, I would not be able to use it in Stylo (as I have discovered earlier); then, only online-accessible texts can be used with this software and thus they will always have these serious problems in them that will introduce attribution errors.

Unless you are claiming the songs in "Lord" are by a different author vs. descriptions vs. dialogue, for a total of over 3 authorial signatures; it would make no sense to test these components separately as genuine linguistic signatures record intuitive language usage, and not usage that changes with the genre/writing-type.

The software used for attribution should not be overwhelmed with data pieces of over 5000 words; if it is; the software designer needs to adjust this problem so that it can handle the full size of textual materials. My Renaissance corpus has 7.8 million words. So, if the size of the corpus means I am an outsider vs. insider...

All of your objections are deflecting to the central problems with your method that you have not addressed because you know they are problems that are likely to be the cause of systemic misattributions. I have proven my method words better. Yet you have not emailed me to ask for a free review copy of the 2 volumes (698 pages) in my series where I explain how this is the case in several chapters. I cite some of the articles you have mentioned in this discussion, discuss the Stylo method, and explain its faults in this book. Any researcher who refuses to consider alternative methods in a given field is doing a disservice to the progress of science.

319Petroglyph
Modificato: Dic 9, 2021, 10:22 pm

>315 faktorovich:

1. removing pronouns confuses most of your attributions

What? All the graphs in >309 Petroglyph: are based on calculations with the pronouns left out. Look at the bottom of those graphs. That parameter is stated right there.

If I try with the pronouns included, I get this:

How does this invalidate the results in >309 Petroglyph:???

(this is why obviously the most-frequent words that tend to include pronouns should not be removed, and yet you have argued for their removal).
... their removal when you're comparing 1st-person and 3rd-person books, yes.

2. You have shown that 5,000/1,000 word samples jumble your results,
Lolwut. Not a single slice of Thompson was assigned to Baum, or vice versa. That confirms the initial results.

but you have not stated what word-size the chunks are in your original graph.
No chunks there. Full, complete, unabridged novels.

And if there are 2 different bylines in the central texts that chronologically make sense with the first author having died; there is no mystery in comparing them.
Wut? What "texts" are you talking about? Only one of the texts in that corpus has been assigned two bylines: it was originally published under Baum, later under Thompson. Because she wrote the damn thing.

"It is possible your method came up with an erroneous attribution
Sure, why not. I'm not perfect. Thing is, My results match Binongo's. Take it up with him. old sport.

instead of admitting that you have indeed made errors even when faced with detailed data in my spreadsheet that explains the error, as I have done in the "Lunch" experiment."

Well, you think my method is wrong. Few other people here seem to think so. The Lunch Break Experiments (tm) -- please get that name right, it kinda means a lot to me -- have confirmed earlier experiments run by experts. And you disagree with all of them.

Point is: I think you're wrong in calling my method "absolutely absurd". You think everyone else is wrong in calling your method, um, misguided and wrong.

320Petroglyph
Modificato: Dic 9, 2021, 10:39 pm

>318 faktorovich:
I followed your exact steps in Stylo and this generated the errors. So if there were additional steps for cleaning up the data; why would you omit those

Look, I'll be frank. I offered those steps since they were the only way I could guarantee that you'd have access to that text. In previous posts you've expressed unhappiness at not being able to point R at your files.

But if you must: nothing prevents you from taking the cleaner CBronte_jane text that was in the corpus for the first Lunch Break Experiment (tm). Odd characters should disappear there.

Either way, the fact that an improperly encoded text causes issues is not a reason to dismiss random sampling.

Unless you are claiming the songs in "Lord" are by a different author vs. descriptions vs. dialogue, for a total of over 3 authorial signatures; it would make no sense to test these components separately as genuine linguistic signatures record intuitive language usage, and not usage that changes with the genre/writing-type.

No, what I'm saying is that poetry and dialogue and descriptions have different patterns of punctuation and content vocabulary and function words. If you slice LotR into manageable chunks, by the laws of chance some of those chunks will have more poetry in them than others, and so they will be plotted in slightly different positions than slices without poetry.

I'm not talking about "testing for authorial signatures" -- how is that the only use you can see for any of these technologies??? I'm talking about handling data and the things you can do to corpora to prepare them for analysis.

"not usage that changes with the genre/writing-type"

Uh, yes? Just one example should demonstrate this: Dialogue will have a greater concentration of punctuation marks than descriptions or more introspective passages. Several punctuation marks are part of your model, too. Including commas. Many pieces of dialogue end in a comma, before a saying verb, for instance. Also, authors will give their characters their own voices. That might confuse metrics, too. Poetry (especially renaissance poetry and 17thC poetry) will incorporate and play with older speech patterns. And the puns! And all the double meanings!

An author's signature is not 100% reliably traceable through all genres and text forms and styles. That's why you use differently-weighted parameters to test different genres.

Whether or not you agree with that last statement, you must see that the things you look at to trace authorial signature (e.g. punctuation) can change depending on character or text type, right??

321prosfilaes
Dic 9, 2021, 10:40 pm

>299 faktorovich: There is no bias in my re-attribution of the British Renaissance to 6-ghostwriters. I am simply reporting what the data says.

The data doesn't and can't say that the same person wrote two works. All it can say is that there's a certain degree of similarity in the style of the two works, in the ways the tests measure the style.

>301 faktorovich: "Falsifiability" is not a "key characteristic" of all science. The gravity equation g = GM/r2 might be something that cannot be proven to be wrong

Newton's theory predicts very carefully the orbits of the planet. It gets Mercury's orbit wrong by 38-42 arcseconds per tropical century.* This is a very famous test of Newtonian gravity that showed that Einsteinian gravity is more accurate.

* https://en.wikipedia.org/wiki/Tests_of_general_relativity#Perihelion_precession_...

322faktorovich
Dic 9, 2021, 10:42 pm

>319 Petroglyph: The blue vs. green colors distinguish cross-byline texts in the diagram, but they exist in the same portions of the graph. And if you have defined these two bylines as distinct; you have biased the findings to favor this two-sided split.

If the first image does not include chunks and it is the most clearly distinguished out of them; this proves that the chunking method into 5000/1000 pieces only confuses the attribution results; so chunking should not be used.

I think you are repeating what I am saying about the 2 bylines, but due to exhaustion at this hour you seem to not recognize we are saying the same thing.

My method is entirely transparent and I have provided basic steps that are enough to guide a user through it. This is why you have not had to ask me to explain the steps in my method; whereas I have spent days trying to get you to offer a full list of steps involved in your method, and you have not yet been able to provide such a list. Yet again, most of the attribution results of my method matched the results of your method; so to say that my method is entirely "wrong", you have to find your method to be at least "wrong" in the attributions where our methods agree. My method is the best approach available with modern accessible computing for establishing authorial-attribution. This is a statement of fact; I have reviewed the Stylo method and most of the other previous approaches before designing my own method. I would not have designed my own method if I thought there were any working methods in the market. I have questioned you about your approach because the points I have questioned you on are those that prove the error of this approach, especially when they are explained in more and more detail as you have done in this discussion. Most articles abbreviate these explanations and their researchers are never questioned to explore the underlying problems I have focused on. I discuss these problems in the book I already published; nothing you have said has been new to me (including the Excel file glitch that I now recall I solved when I last received this type of Stylo data-set). The point of this discussion for me is to show the public the type of falsehoods computational-linguists have been claiming that have prevented attribution studies from progressing from pre-computing intuition-based attributions.

323faktorovich
Dic 9, 2021, 10:49 pm

>320 Petroglyph: Because my method is more accurate, it does not fail to distinguish the authorial style even if the genre changes between poetry and non-fiction. I have explained before that groups like Verstegan's in my Renaissance corpus have drama, poetry, non-fiction etc. in them and yet they are all correctly identified. The method has to be such that genre is irrelevant. My data table for the 284 texts includes the exact comma etc. rates in the texts together with the genres these appear in. It is illogical to generalize about poetry etc. without citing specific texts where you have observed this division.

324faktorovich
Dic 9, 2021, 10:50 pm

>321 prosfilaes: Yes, this is also proof that my new method of computational-linguistic attribution is the more correct phase of scientific discovery because it falsifies previous approaches, and corrects them.

325Petroglyph
Dic 9, 2021, 11:43 pm

>322 faktorovich:
"The blue vs. green colors distinguish cross-byline texts in the diagram, but they exist in the same portions of the graph"

What are you on about? The green book (or the five or ten green slices in the sampled graphs) is the book whose authorship we're trying to determine. The green book in its entirety falls well within Thompson's cloud. And so do all five 5k-word samples. And so do all ten 5k-word slices.

If any of these segments were majority-Baum, the software would have placed them in his half of the graph.

"the first image does not include chunks and it is the most clearly distinguished out of them. this proves that the chunking method into 5000/1000 pieces only confuses the attribution results; so chunking should not be used"

... because there's only 21 labels on that one, one per text? The next one has five samples per text, so 21 * 5 = 105 labels, and the next graph has 10 samples per text, so 21 * 10 = 210 labels. Each chunk is tested by the software and assigned an attribution, and so is plotted on the graph...

Would you like a bigger graph? Here's a 5000x3114 pixel cluster graph of 5 chunks per Oz book. I can make that bigger, if you like.

And here is a graph with 10 samples per individual book, i.e. 210 chunks that are plotted as just dots. Red for Baum, green for Thompson, blue for the text that has seen two bylines.

Either the limited legibility of every single letter in that graph is genuinely some sort of OCD thing you can't overlook, or you're just grasping at improbable straws to deny those results. Do you happen to be neurodivergent?

"I have provided basic steps that are enough to guide a user through it"
If your car won't start:
1. Take hammer
2. Hit own head with hammer
3. Repeat until car starts.

Being able to provide simple steps that others can follow is not a shorthand for your methods being correct.

"you have to find your method to be at least "wrong" in the attributions where our methods agree"

Broken clock and all that. And not all of your steps are wrong. You just look at plenty of the wrong parameters, and you conflate everything like whoa.

"Most articles abbreviate these explanations and their researchers are never questioned to explore the underlying problems I have focused on"
False. Just take a look at that Gladwin et al. article. The bulk of it is trying out various methods on training data to see which one is the best. The literature is full of criticisms of older methods, and new ones are constantly being refined.

If you complain that regular, run-of-the-mill "here's a problem, let's solve it"-type articles don't bother to explain all the basic steps of the field in language that any layperson can understand.... You don't have much exposure to scholarly literature of this type, do you?

Your comfort zone is not an appropriate measure for the correctness of a method.

"due to exhaustion at this hour "
Nah, I've had students worse than you.

I am signing off now, though. I'm going to watch another kind of noob be all cute and inept.

>315 faktorovich:
"There is no rational reason for me to spend the next 2 days applying my full range of tests to this particular research problem in which I have no scholarly interest."

Oh no, absolutely, that goes without saying. I'm not expecting you to try and replicate those results. Or whatever. I don't want you to waste your time on this, either.

326faktorovich
Modificato: Dic 10, 2021, 1:29 am

>325 Petroglyph: Breaking your text into smaller chunks and making a new graph with them does not address my question. If you just answer questions with nonsense and draw more detailed pictures that represent nonsense, it is all still nonsensical. A lack of sense in an explanation means nobody can understand it; if there was sense in the logic at least somebody other than the speaker should be able to understand it.

Yes, your steps have indeed been similar to hitting your own head repeatedly with a hammer. I don't know why you have felt compelled to tell me about this process. I asked you about your computational-linguistics method.

After running several experiments, I have learned which parameters work for which period etc. The parameters I choose do not produce genre-specific errors, for example, as yours do.

Most of the articles you have cited include the same repetitive book-review section that summarizes the same basic elements of major computational-linguistic studies of the past, and then they all fail to define their own method with any specificity that is sufficient for not only reproducibility, but even for scholarly questioning of the basic premises.

The last time I played a game or watched a tutorial about a video game was a couple of decades ago. I went to the ALA convention and tried selling my graphic design skills to design board games to a group of video gamer builders recently, and they insisted that if I was not an avid gamer they would not consider me as a potential collaborator. Computational-linguistics like board game design is not an activity for the inept or for fun-seekers, and if I was inept, this conversation would not have hit 3-article-sized point. I have to translate the next 14 volumes of the Renaissance series after I finish trying to sell these 14 volumes. Digressing into attributing the rest of human knowledge is definitely a "waste of time". If this was a game; I won the experiment, and you are refusing to admit defeat by down-talking your opponent.

327anglemark
Dic 10, 2021, 3:55 am

if I was inept, this conversation would not have hit 3-article-sized point

I simply do not know what to say. If you get everyone to tell you at length how you are failing, and trying their best to explain to you how, it's a sign that you have succeeded?

OK.

328MarthaJeanne
Modificato: Dic 10, 2021, 4:10 am

>326 faktorovich: I'm glad for you that one person thinks you won. Even if you are that one person.

329WATSONPK
Dic 10, 2021, 4:13 am

Questo utente è stato eliminato perché considerato spam.

330spiphany
Modificato: Dic 10, 2021, 6:22 am

>321 prosfilaes: The data doesn't and can't say that the same person wrote two works. All it can say is that there's a certain degree of similarity in the style of the two works, in the ways the tests measure the style.

This, a million times.

Such results admittedly sound rather banal in comparison with a discovery that entails rewriting literary history.

331susanbooks
Dic 10, 2021, 10:26 am

>326 faktorovich: "I went to the ALA convention and tried selling my graphic design skills to design board games to a group of video gamer builders recently, and they insisted that if I was not an avid gamer they would not consider me as a potential collaborator. "

So you've experienced rejection because of your lack of education (about games in that case) before. If gamers found your work too naive to be useful, why are you surprised that statisticians, readers, & literary scholars in this thread respond similarly? Computers are fun but by themselves they don't produce evidence. The analyst interprets data as evidence. In other words, the analyst-interpreter (you) affects the results. And, as with gaming, you're once again misled by the Dunning-Kruger effect.

332Petroglyph
Dic 10, 2021, 11:24 am

Goodbye, Faktorovich. This has been such sweet sorrow. I hope you find ways to spend less of your life copying and pasting stuff in and out of web pages. May you expand your comfort zone and become a life long learner.

Peace.

(btw. You asked for the frequency table for all 16 novels in >317 faktorovich:. Here is a zip file with the table_with_frequencies, the stylo_config file, the wordlist used in the calculations, and a table with the weighting per corpus file. Enjoy!)

333faktorovich
Dic 10, 2021, 1:25 pm

>331 susanbooks: The claim that the problem was that I was not a gamer was a façade to hide a rejection based on my gender, as a woman was attempting to enter the male-dominated gaming industry. They were not asking me about my knowledge of gaming, but rather of my practice of gaming; a designer has to know how to design, not waste their time gaming instead. "Computers are fun" - you are just saying random unrelated stuff. Computers "by themselves... don't produce evidence"? You mean like a computer that has been turned off - producing evidence on its own AI power? I know computers don't produce evidence themselves. That's why I invented a method where I use a computer to create the evidence. Before data can be interpreted, the correct data parameters have to be collected in a rational and systematic manner that solves the problem and leaves evidence behind of what the collection process was for others to check it for errors. My process is thus correct, while simply plugging the data into the Stylo software despite the various glitches this program has is incorrect. There was no bias (Dunning-Kruger effect) either in my attempt to design board games (as I had a successful track-record of designing over 300 books that qualified me, and the only bias was the gender-bias on other side), nor in my logical interpretation of the data I have gathered about authorship in the British Renaissance.

334faktorovich
Dic 10, 2021, 1:31 pm

>332 Petroglyph: At the top of your "wordlist" file it says: "# This file contains the words that were used for building the table # of frequencies. It can be also used for further tasks, and for this # purpose it can be manually revised, edited, deleted, culled, etc. # You can either delete unwanted words, or mark them with "#"" This is the definition of bias, the 5,000 words your "Lunch" experiment tested for was not a randomly-selected list, or a list of the top most-frequent words in all of the texts in the corpus, but rather a list you manipulated with "edits, deletions" etc., as you explained in your comments about pronouns. You check the results of a random test before deliberately removing the words that lead your conclusions to anything other than a 95%+ re-affirmation of the existing bylines. This how your outputs are skewed to appear to show correct results. If you method was fair, rational and scientific, there would not be a step where the wordlist could be thus unfairly adjusted.

335Petroglyph
Dic 10, 2021, 2:09 pm

>333 faktorovich:
"not waste their time gaming"
I wonder why they didn't let you design things you clearly think are beneath you. Must be their prejudice.

>334 faktorovich:
"a list you manipulated with "edits, deletions" etc., as you explained in your comments about pronouns"

How much time do you think I spent on those things? Do you think I have the time to trawl through 5000 words and check every one?

Read what that file says again, but more slowly. If your text contains words you don't want the software to count in its frequency tables, you can put a "#" in front of it, and on the next run-through the software will skip all the words you've marked as irrelevant. If you've pulled your texts from ProjGut, you may want the software to skip words such as "Gutenberg", for instance, or "tm". Another use-case is that, for plain-text versions, the place where html and epub versions have illustrations often just say *Illustration* -- those clearly aren't something you want to take into account. Now, you can either a) write a little program that deletes all of the occurrences of *illustration*; b) you can scroll through all of the texts in your corpus and delete them manually, or c) you can tell the software to ignore the string *Illustration* altogether by hashtagging or deleting it in one central location.

It says you can. Not that I have done this. Very important distinction, that. The distinction between truth and deceit.

I'm getting tired of your unfounded accusations of me manipulating data. I don't do that.

I will now prove to you that you are making false accusations.

If I told the software to ignore words by hashtagging them, you should be able to find those. Go on. Check: open that file with notepad, press ctrl+f and search for #. The only place you'll find # is in that notice you copied in your message.

I did not delete any words, either. If I did, there'd be fewer than 5000. Copy-paste that word list into a spreadsheet and see how many rows there are. You'll find there are 5000. No deleted words.

Go on. Check. Then come back and apologize for your baseless and malicious accusations of me deliberately skewing my results in order to make you look bad.

336AnnieMod
Modificato: Dic 10, 2021, 5:41 pm

A question that keeps getting ignored despite a few people hinting at it so I will just ask.

A lot of the tests in this methodology are quantitative - down to punctuation, spelling of words (comparison of usage does not account for various spellings as far as I can say), abbreviations and other similar elements which are as much a feature of an author style as they are of editors', printers and publishers' house rules. And with the texts of the British Renaissance that's even more pronounced because of the fluidity of both punctuation and spelling.

All tests seems to have been ran on the texts from the printed corpus. Which is fine as long as the deviations are taken into consideration before looking for similarities. Not all manuscripts are available (unfortunately) but there are enough of them in various libraries and institutions which can be made available to scholars (or are sometimes available on microfilm or even in a legible transcribed "as is" copy). How many of those manuscripts and from how many different authors (as per the current attribution) had been inspected and compared with the texts available in the corpus to find out the differences in these elements between the two texts - thus allowing either these differences to be discounted if minimal or to be taken into consideration when comparing the texts? Or are there any studies that had done that comparison before where possible?

337andyl
Modificato: Dic 10, 2021, 3:22 pm

>333 faktorovich:

I am going to call bullshit to that somewhat. Why should a video-game company pick someone who knows zilch about gaming to design a boardgame, who hasn't had any experience of designing boardgames, who doesn't play boardgames or video games, rather than go to one of the hundreds of experienced companies and individuals out there who know both video-gaming and what works in a boardgame design today. Boardgames today are not, or at least the good ones are not, really like most of the mass market boardgames that people think of.

But of course you were not talking about game design you were talking about graphic design - there is absolutely thousands of commercial artists and graphic-designers out there, all with portfolios, many with experience of practical design in the field of board games. Without a history of previous work or a portfolio of work* no-one is going to employ you in that field. This is pretty normal in many fields of work.

* I have seen some people get work through a portfolio of their redesigns of existing boardgames. But that redesign work has been public and widely praised.

338MarthaJeanne
Dic 10, 2021, 3:30 pm

I don't see how you could do design work on games if you don't game yourself. It's one of those areas where unless you have been there yourself, you will not see the problems. It's no good having a game that looks beautiful if it's not playable.

339Keeline
Dic 10, 2021, 4:29 pm

>336 AnnieMod: For most works it is rare for there to be a single manuscript (or typescript). In the 20thC examples in my field the norm was to compose on a typewriter and then there would be hand corrections, usually by the editors paying for the story (book packager like the Stratemeyer Syndicate or publisher).

On some occasions, after a marked-up draft was returned to the author, portions or all of the typescript would be returned to the author for rework.

I have seen and we have some examples where two or more drafts of a typescript are extant. The initial one can have hundreds of corrections marks in a couple hundred pages. This becomes a lot of work for the editors and may be a reason not to ask a given writer to work on later volumes in a pseudonymous series.

Here is an example from our collection for Trixie Belden #22 Mead's Mountain. It is from the mid-1970s. In this case there are colored pencil marks for each editor to made notes on the typescript.

In some cases the edited manuscripts have large insertions by an editor that was intended to replace something that was not desired from the author. This kind of change would tend to skew a bona fide authorship analysis. Yet, you only know it occurred if the manuscript with corrections is at hand.

This is a fairly easy example since the first draft was typed. When the manuscript is holograph (handwritten) then things get much harder since there is the issue of chirography (reading the handwriting). Some authors are notoriously bad about punctuation and this was often left to the publisher and sometimes generations of publishers who engage in copy editing to make the book conform to the publisher's style. For these and other reasons, I am concerned about relying too heavily on details like punctuation (exclamation points, etc.) as a marker of an author's style. Often it comes down to what the publisher will allow.

James

340MarthaJeanne
Dic 10, 2021, 4:38 pm

>339 Keeline:
>335 Petroglyph:
>321 prosfilaes:
>285 lorax:
and others
It's been a real joy to have voices in here that know what they are talking about, and are able to write about it clearly.

341AnnieMod
Dic 10, 2021, 4:40 pm

>339 Keeline: We are talking about the 16th century here - not about the 20th though.

I completely agree that 20th century manuscripts are a question of interpretation (and there many versions around depending on what you count). There are sometimes more than one for the 16th century books as well but nowhere near as many (for obvious reasons) and even when there are more than one, my point still stands - if that manuscript's quantitative analysis is materially different from the published text and the published text is used for author attribution, is the attribution for the writer or for the editor/printer? Add to that the lack of rules in both spelling and punctuation and those differences may multiply just because whoever sets the book has a different idea of spelling or using punctuation.

Which is why I asked how many texts had been verified against the existing manuscripts or if such a study had been done before and if so by whom.

342bnielsen
Dic 10, 2021, 4:50 pm

What >340 MarthaJeanne: said!

343timspalding
Dic 10, 2021, 5:00 pm

>340 MarthaJeanne:

The way this topic eventually went, and the people who brought it there, is everything that's good about the LibraryThing community.

344Keeline
Dic 10, 2021, 5:16 pm

>341 AnnieMod: I think that the problems of older manuscripts multiply the problems of simple examples like the one I showed. Reading just letters from some people can be quite a challenge, let alone full book manuscripts in holograph.

There are cases where the typesetter/publisher have misread a manuscript and put the wrong word in. Sometimes this is caught at the galley stage. Other times it is only caught and corrected after publication. When an error is corrected early in a book's publication history, it can become a "point of issue" — a way to distinguish between copies with the error and ones where it has been corrected.

I think that most textual analysis is done on the published versions, after the editors have had their input. Sometimes it is generations of editors. Thus it is hard to get as close to the author as one might like.

Even when there are manuscripts extant and available, getting these into machine-readable text requires a lot of work and it is probably only rarely ever done because of this.

There are no manuscripts for Shakespeare's plays, for example. The Quartos, many of which were published during his lifetime, are probably all unauthorized and thought to be the result of someone with a good memory in the audience writing down the lines or a person acting one role and doing his best to remember the rest. Most of what is called Shakespeare comes from the four folio editions. The first of these was published years after his death in 1616 in 1623 by his friends. Even the selection of plays included varies a bit from edition to edition. Some plays are only known from their inclusion in the folio editions.

Since it was not possible to have the type remain "standing" for the life of one or more editions, it was fairly common to reset the type for each new edition. This can account for variations in spelling and typesetting from one to the next. Technologies of reproducing page plates like stereotype and electrotype were of the 19th Century.

If you go to an online text, which will you get? My wife has a Ph.D. in English with a specialty of Shakespeare so my limited knowledge is based on things she has said over the years. Lately she directs open readings of plays. A scene is cast on the spot (in person or via Zoom lately) and people read from the versions they have. But it is very obvious that the available printed or online copies vary a lot, including to whom lines are attributed and whether sections of scenes are included at all. This is why the performers in a production normally have to all work from the same copy.

What effect would this variation have on textual analysis. The immediate answer is quite a bit.

We might say that the First Folio of 1623, if carefully transcribed, is the gold standard but this is after William Shakespeare of Stratford was dead and buried so he was not around to provide any edits. The plays in his lifetime probably were adjusted over time when reperformed. Some look to plays like Macbeth and wonder if some scenes were added at a later date and quite possibly not by Shakespeare.

How many 1500s and 1600s literary and dramatic manuscripts are available to consult and analyze? Probably there are very few. So it is probably a safe bet that this is not part of the usual or new textual analysis because it is so hard to get the material prepared for that analysis.

But I should stick to the 20thC series books which has been my area of specialty for 33 years as a collector, bookseller, researcher, and writer of popular and academic articles and presentations. It's not my day job but it is my long-time avocation.

James

345AnnieMod
Dic 10, 2021, 5:46 pm

>344 Keeline: And if a study which attempts to prove that a huge amount of works had been mis-attributed does not do the work of looking at the available manuscripts, what kind of study would do that? The whole thesis here is that everyone had been wrong for a very long time after all. Proving that any similarities based on some of the elements are actually meaningful is not a useless exercise even if it is not trivial by far.

If the answer is that all the work is done on the printed versions available for machine analysis, that's fine. If the answer is that none of these texts (or texts by the same authors) actually have an available manuscript, then that solves that question.

Shakespeare is... Shakespeare. But his troubled publishing history is one of the reasons why I am wondering how definitive is the comparison of punctuation and abbreviations and so on in printed texts of the period and how much they show an author's style really... :)

346faktorovich
Dic 10, 2021, 5:56 pm

>335 Petroglyph: You previously said several times that you delete pronouns from your tested words, now you are denying "trawling" through the words to screen out words you view are inferior for attribution? So then the words you choose are the most frequent 5000 words in the texts? Such a large top-words list will include most of the words in any given text given the high frequency of the most common words. Comparing such a large quantity of words against each other would create a 5000X5000X#of-texts dimensional mathematical space unless there is a trick that simplifies all this or evaluates something simple - or your formula that obviously does not fit your description of your methodology. You are just repeating that you do delete some words with the # mark as "irrelevant", while also insisting that you do not "trawl" through them to delete them. Why would you keep "Project Gutenberg" or "Illustration" in your text-file. I always delete these elements during the glitches-editing stage when I clean up files to prepare them for testing. If you fail to delete these elements, your word-total for the text is wrong, among other things. My accusations against you skewing results are based not only on my previous research of data manipulation in most published computational-linguistic studies, but also in the various contradictory and false remarks you have made across this discussion that I have pointed out. I have not referred to general falsehoods in your case, but to specific instances. You have ignored most of my criticisms, except for those that you think you can explain. Your failure to even explain the formula you said you are using is just one example of how you are not making a full disclosure here of your method, and if you did, even more falsehoods would become visible.

347faktorovich
Dic 10, 2021, 6:07 pm

>339 Keeline: My conclusion about the British Renaissance is that a Ghostwriting Workshop of six worked collaboratively to create it; a portion of the texts have a very strong single authorial signature, while another portion have two or more authorial signatures. My data tables show the degree of collaboration and between whom collaborations took place in each text. As I mentioned earlier, at least three of these ghostwriters were publishers (Verstegan, Byrd and Jonson), and as part of these roles they would help prepare texts for publication by not only typesetting or organizing the typesetting process, but also editing handwritten manuscripts for print. This is why there are occasional very minor tertiary signatures in some of these texts, where the collaborator played an editorial role, and did not write a significant portion of the full text. The size of my corpus (284 texts) and the 27 different tests applied to it makes it easy to pinpoint even these minor editorial contributions. Only if changes are too minor for tests to register them would editorial input not be noticed by my method, and in those cases the collaboration effort is too minor for an attribution.

348susanbooks
Dic 10, 2021, 6:08 pm

>343 timspalding: "The way this topic eventually went, and the people who brought it there, is everything that's good about the LibraryThing community."

Yes! If nothing else, this thread has demonstrated how many smart, patient, articulate people participate in these forums. The fact that they all list libraries I can browse for book recs is wonderful. Huzzah LibraryThing!

349amanda4242
Dic 10, 2021, 6:08 pm

>343 timspalding: Couldn't agree more. And I love that LT supports such a thread rather than shutting it down and castigating members for daring to rebut the work of an author.

Petroglyph, let us know if you ever decide to publish your Lunch Break Experiment (tm)! It could take off like the For Dummies series!

350faktorovich
Dic 10, 2021, 6:28 pm

>344 Keeline: I explain that there are "Shakespeare" (i.e. Percy) play manuscripts or those that match "Shakespeare's" handwriting that have been either pointed out by other scholars or by me in my Re-Attribution Volumes 1-2. Actors only learn their own lines, and not the entire play. And troupes were staging new plays in as few as a week or a few days (according to "Henslowe's Diary"), and it would be impossible for any human to memorize a new play (17,000-30,000 words) in a few days; you must be imagining the modern play production process where a troupe spends a year or more repeating perhaps the same re-running play, or at least gets to rehearse for months before the first staging; after all that repetition, a play might be mostly remembered, but try sitting down for 2 days and remembering as much of any play as you can, and you will have an estimate of how unlikely this is. Around half of the "Shakespeare" plays were published decades before the first folio added over a dozen plays in 1623. I explain the complexities of this etc. across the series. The heavy edits between "Hamlet's" first 1603 and second 1604 quarto is explained in my translation of this previously untranslated first version of this play.

Typos in a transcription that misplace a line from one speaker to another have absolutely no impact on my attribution tests, as the different character names would register as equivalent nouns in the linguistic elements my method counts.

Yes the fact that around half of "Shakespeare's" plays were first-published years after his death is one of the reasons for my re-attribution of these later plays to mostly Percy and Jonson as the underlying ghostwriters behind this pseudonym. Percy and Jonson collaborated in various degrees in their playwriting, and this is why previous computational-linguistic/linguistic studies of these plays have found signs of more than one author contributing. Obviously, most of the 1623 plays were either written close to that date or were completely or very heavily re-written after "Shakespeare's" death. There are several chapters in Volumes 1-2 and in the translations (Volumes 3-14) where I explain this history, past misattributions etc. fully.

EEBO has hundreds of texts from this period available in an accessible digital format for testing.

351lilithcat
Dic 10, 2021, 7:06 pm

>350 faktorovich:

you must be imagining the modern play production process where a troupe spends a year or more repeating perhaps the same re-running play, or at least gets to rehearse for months before the first staging;

You clearly know very little about "the modern play production process". With the exception of long-running Broadway-type hits, most plays run for a couple of months, if that. And the cast rarely, if ever, gets "months" to rehearse. A few weeks, including tech, at most.

352Petroglyph
Modificato: Dic 10, 2021, 8:16 pm

>346 faktorovich:
"You previously said several times that you delete pronouns from your tested words"

Remove from consideration. Not delete from the corpus. Just tell the software not to take them into account.

Whenever I do so, I am explicit about it.
The graphs produced by the software explicitly note at the bottom whether pronouns have been removed.
I have been open about my reasons for why I (and numerous other scholars) do so: this removes some of the artificial distance between first-person novels and third-person novels, caused by their differences in pronoun proportions.
The only way you even know about the "don't include pronouns in this graph" is because of #1 and #2.

I would say that this openness and pro-active reporting of the steps taken in generating the graphs are kinda the opposite of the kind of sneaky and underhanded manipulation you are accusing me of. You still owe me an apology for that, doctor Faktorovich.

"now you are denying "trawling" through the words to screen out words you view are inferior for attribution?"

Don't put words into my mouth. I don't consider pronouns to be "inferior for attribution". They're very useful. But I do recognize that, sometimes, in some genres, they have an outsize impact, and it can be useful to look at graphs with and graphs without them and compare them (Can your method even do that? Within seconds? Of course it can't!).
Uh, yes? I don't scroll manually through word lists, grinning evilly as I delete words *by hand* left and right. Here is the total extent of what I do to remove pronouns:

It's a standard option. Right there. See that orange circle? That points out a square you can tick. As a standard option. I cannot stress this enough. In a standard piece of software, used every day, by experts. (And sometimes in n00bish Lunch Break Experiments (tm) on LT). If you have issues with this that the people who code this and who use this every day haven't noticed, well, either you're a lone genius of brain-melting insight, doctor Faktorovich, or you don't know what you are talking about. Given your skills with .txt files and Excel, I'm going to side with the professional Computational Linguists on this one.

Your mindset is that of someone who scrolls a lot and does things manually. You're assuming wrong in thinking that the professionals think like that.

You seem to be thinking that it's either "all the pronouns, every time, every test, not a single exception" or "This person must think that pronouns are never useful."

Neither is true. Stop projecting your black-and-white thinking onto me. I don't think like that.

"would create a 5000X5000X#of-texts dimensional mathematical space"

How can you ask me multiple times for the exact configuration of my data, and then proceed to skip the relevant details? You invent this ridiculous caricature of the method even though the config file is right in that zip file that you downloaded. The word list merely lists the 5000 most common words in the corpus used for that analysis. Nowhere does it say that it's a "5000X5000X#of-texts dimensional mathematical space". That ridiculous conclusion is all yours. That is a story you've concocted in your head and then, without even doublechecking it against that config file that you have at your disposal, you come into this thread and pretend that your misunderstandings are mine.

Use the sources at your disposal, doctor Faktorovich. You should have learnt how to do that as an undergrad.

"You are just repeating that you do delete some words with the # mark as "irrelevant", while also insisting that you do not "trawl" through them to delete them. "

Again this stupid accusation. Go search the word list file. I dare you to find a single word that I have hashed out. I haven't, and your repeating this accusation is a lie.

I listed reasons for why someone *might* exclude some words from consideration like that. I also stated very explicitly that I did not do so.

You owe me a retraction and an apology, doktor Faktorovich.

"Why would you keep "Project Gutenberg" or "Illustration" in your text-file. "

Your small-scale, copy-paste intensive, one-text-at-a-time workflow is showing again. It is entirely possible to write a piece of code that crawls entire websites (such as ProjGut, or newspapers, or online archives such as the Proceedings of the Old Bailey) and that retrieves all the texts that match its requirements (published in period X; between lenghts x and y; of genres A, B or C.). Thousands of texts for analysis can be harvested like that. Tens of millions of words. Hundreds of millions of words. At that point, it's good to a) have a look at word lists to see if any distorting crap has made it through the harvesting and automated cleaning and prepping, and b) have a central place to nix it.

Your plodding and time-intensive copy-pasting works for you. I'm sure you've invested lots of time in it. But computational linguists don't work like this. Not at all.

"Why would you keep "Project Gutenberg" or "Illustration" in your text-file. I always delete these elements during the glitches-editing stage when I clean up files to prepare them for testing."

Oh. So when you do it, it's prepping your texts. When I do it, out come the accusations of manipulating the data. Interesting.

"If you fail to delete these elements, your word-total for the text is wrong, among other things."

Let us, hypothetically, consider the case of a 5000-word list with a single hashtagged-out word. Question: does the software then only consider 4999 words? Or does it bring in the 5001st word, in order to keep the "most frequent words" number constant at 5000?

I will point out that any statements you have made concerning hashtagging absolutely rely on you knowing the answer to this question. So you must know.

"My accusations against you skewing results are based not only on my previous research of data manipulation in most published computational-linguistic studies"

Prior to this thread you had not even heard of me. You cannot have included me in your previous research. Therefore, it is impossible for you to have demonstrated any results-skewing on my part in the research performed at a time when you were unaware of my existence.

This is a pretty poor example of your reasoning skills, doktor Faktorovich.

"the various contradictory and false remarks you have made across this discussion that I have pointed out"
You have, on numerous occasions, misunderstood even the simplest things (involving bollocks, ducks, attics, character encodings, ...). You conflate "I am right" with "Others are wrong". You conflate "casting aspersions on not-my-model" with "yes-my-model is correct". You conflate "Doktor Faktorovich thinks this is nonsensical" with "this is nonsensical". You engage in a lot of black-and-white thinking, and are very quick to cast yourself as the victim.

Such, I'm afraid, is the lot of all Revolutionary Mavericks. Take courage in the though that no prophet is accepted in their own hometown. You are not the first Maverick to suffer such pangs and such slings and arrows. You won't be the last.

"You have ignored most of my criticisms, except for those that you think you can explain. "
Gish Gallop.

"Your failure to even explain the formula you said you are using is just one example of how you are not making a full disclosure here of your method, and if you did, even more falsehoods would become visible."

More sloppy thinking, Doctor Faktorovich! You seem to have confused me with someone who owes you an explanation. You are not entitled to my time. You are not entitled to my teachings. Fixing the gaps in your knowledge is not something that I am responsible for. I am not your daddy, your mummy, your teacher, your mentor, your colleague, your student, your editor, your pet-sitter, your sassy gay friend, *or* your weed dealer.

There's tons of stats help out there. Avail yourself of it. Whining that people who disagree with you on this stuff aren't taking the trouble to undertake unpaid working hours to teach you is entitled and unbecoming of someone who includes "Doctor" in their byline.

>334 faktorovich:
"You check the results of a random test before deliberately removing the words that lead your conclusions to anything other than a 95%+ re-affirmation of the existing bylines."

Here is where you accuse me yet again of doctoring my results in secret without letting you know.

Apologize for your lies about me, "doctor" Faktorovich. Show me that you can be a person I respect.

353Petroglyph
Dic 10, 2021, 10:22 pm

>336 AnnieMod: All tests seems to have been ran on the texts from the printed corpus

AFAIK plenty of the kind of text editions that are freely available online (e.g. StandardEbooks, or Sacred Textsin addition to the usual suspects) just silently modernize the punctuation and the spelling.

For instance (and I am absolutely not implying that Dr A. F. used this particular website), this section of the very extensive StandardEbooks stylesheet explains that they normalize things like italicization -- including for emphasis -- and punctuation.

354AnnieMod
Dic 10, 2021, 10:39 pm

>353 Petroglyph: Yep. The Bibliography lists the sources - most are Early English Books Online, some are Project Gutenberg, Oxford Text Archive and other available online archives (plus some modern editions). That's what triggered the question to some extent - without accounting for the level of changes first the publishing process back in the day and then the digitization had caused, some of the metrics can be very skewed.

There are tests that can be ran on a corpus with these characteristics of course - but punctuation comparison used to note similarities in style just does not sound like something that would work. A few people brought it up earlier in the thread but it was mostly ignored so I thought I'd ask directly.

355faktorovich
Dic 10, 2021, 10:51 pm

>352 Petroglyph: There is no difference for the linguistic conclusion if you delete a word or tell the system to ignore it in the computation.

Pronouns are some of the best indicators of style that very rarely are influenced by the first vs. third person, but even if they are influenced by this element, then this preference to write in first vs. third person is itself one of many characteristics of a given author's linguistic-style.

I prove that the standard Stylo attribution method is underhanded and erroneous in detail not only in my specific replies (most of which you have ignored), but also in the forthcoming Journal of Information Ethics article. Since I have supported every statement I have made in this discussion with evidence, I have proven all statements I have made and thus I stand by all of them as-is. Across this discussion, I hope I have made it clear that I am objecting to the standard Stylo and other very similar methods, and not to you personally, or to your application of this method. The steps you have described match the standard steps described in other computational-linguistic articles. The problem is not your application of this Stylo method, but the errors in the method itself. You began by claiming that the Stylo method tests for hundreds of different features, but now we are clearly discussing the application of the single word-frequency test that considers the 5,000 most frequent words in a corpus. If you try to build a table of potential combinations of the top-5,000 words in any text you will run into the problem of having to randomly exclude most single-occurrence words while keeping others. Are the words that appear in other texts favored above those that only appear in 1 of the texts or the other way around. If you try to create a drawing of 5,000 words in any text and 5,000 words in another text and draw lines that show the similarities between them; you are going to have a chaotic set of intersections vs. non-matches. Measuring the comparative frequency of every one of these 5,000 words that are most frequent in a group of texts is like coming up with 5,000 different relative features of a group of humans - comparing the nose-length, the eye-color, and tiny differences in elbow-length and 5,000 other such miniscule elements; then you are saying this system comes up with an average of all these elements that is a single number that measures how similar or different each of these humans are to each other; but what if they happen to have many tiny differences but they are basically the same overall height etc.? This chaos is multiplied if a corpus grows to 6 texts (as in your Austen example), and is so complex that you did not even attempt to send the full statistics file for the 16 texts in your "Lunch" experiment. If these words were DNA, 99.9% of them would be identical in all samples. But in literature, there are many words that are present in some measure in most texts. My test for the top-6 most-common words can be easily represented in a table for researchers to question what these frequencies mean in individual texts; what words are uniquely frequent in a text is the element that registers that major intuitive tendencies of the author. The rest of the vocabulary depends heavily on the narrative/ setting and various other elements that are measures of story-structure and not the linguistic-signature. All of the computational-linguistic articles I have read from the past couple of decades use this same approximate 5,000-top-words or some other unstated number of top-words to measure similarity, but their introductions claim that their method is uniquely complex and sophisticating. I have shown how Stylo introduces typos into texts when it breaks a text into pieces; even the introduction of a single mistake that is not in the original text means a given method should not be used as the program has un-repaired problems that can when multiplied across a text lead to a misattribution. You are repeating that using Stylo takes only moments or a lunch-break to perform an experiment, but expedience is in this case at the cost of scientific accuracy. Computational-linguists are making the process of simply opening a folder and running the standard check sound like complex programming. In truth, my method multiplies the accuracy of Stylo by at least 26 times as Stylo's 1 test is turned into my 27 tests. You have not presented evidence of Stylo or any other program being able to run the same 27 different tests my method performs. And the basic steps you have attempted to show have glitches that prevent a researcher from applying them to test the results this system generates on a larger corpus, etc. Your diagram shows that Stylo only has an option for testing "words", not punctuation and other features you advertised it can test.

Your obsession with my use of copying and pasting is mesmerizing. I could create a program that would execute the steps of my 27-test method and then it would be a software that would pop-out the answer instantly with a nice table like the one I included. But as a woman without any connections; I am not going to get any funding to develop this program or to market it to researchers. And even if it was the most brilliant program in the market; researchers like you would consider my gender a disqualifier and would refuse to use it even if it was free. The point of an authorial-attribution method is that it has to lead to an accurate and defendable attribution claim; and my method delivers on both of these points as I have demonstrated. Just because you are using software and I am automating the steps in already completed software programs that require a few extra steps to process the data; this means that my method requires more input labor; but labor is not a fault, but instead just a part of the scientific process.

356Petroglyph
Modificato: Dic 11, 2021, 1:20 am

>355 faktorovich:
"If you try to build a table of potential combinations of the top-5,000 words in any text ..."
To reiterate: this is how you imagine that Stylo analyses things. To reiterate: go check that config file. To reiterate: you have a PhD; you should know to check your data. Especially if the people you criticize have told you twice now that your misinterpretation can be corrected by actually looking at the documents you asked me to give you.

"Since I have supported every statement I have made in this discussion with evidence, I have proven all statements I have made and thus I stand by all of them as-is."

Yes, that is how things work in LitCrit: if you can support your reading with some arguments from the text you're good. But we're talking about maths here. Your arguments actually have to be objectively, mathematically correct here.

Your PhD in English and LitCrit makes you an expert in zero other disciplines.

(I will also note that you have no idea what degrees I have, or whether I, indeed, have any. Reflect on that before you say things about my credibility).

"This chaos is multiplied if a corpus grows to 6 texts (as in your Austen example)"

You keep going further and further down this "5000 x 5000 x # of texts" path, which I have repeatedly pointed out is your misinterpretation of the case. Not mine. CHECK THAT CONFIG FILE.

Do you, perhaps, require assistance in understanding it? No shame if you do. Just tell me.

"you did not even attempt to send the full statistics file for the 16 texts in your "Lunch" experiment. "

You lie, Anna. You are a lying liar who lies. >332 Petroglyph:
And I know you have seen that message and downloaded the zip file because of >334 faktorovich:

"I have shown how Stylo introduces typos into texts when it breaks a text into pieces"

Such amateurism. Like an impatient toddler. You have done no such thing. You were confronted with a piece of software importing a text from the internet that was inappropriately formatted. Instead of thinking, "Huh, I should replace this version with something that is in UTF-8", you just, incredibly, give up and blame the software. I also said in >320 Petroglyph: "nothing prevents you from taking the cleaner CBronte_jane text that was in the corpus for the first Lunch Break Experiment (tm). Odd characters should disappear there." So there was your solution. But no. Just throw your hands up, walk away and grumble about how if it's too difficult for you on the first try, it must suck completely, in general and in specific.

At the first glitch you encounter, you try nothing, and you're all out of ideas. And then you blame the software.

What do you do when your method introduces odd characters and mis-handles carriage returns? Excoriate your method and give up? Of course not. You fix the problem in the text and move on.

Double standards, "doctor" Faktorovich.

"even the introduction of a single mistake that is not in the original text means a given method should not be used as the program has un-repaired problems that can when multiplied across a text lead to a misattribution"

Wow. If a single mistake can throw your method off, then maybe we're better off not trusting it at all. There's like no tolerance for erroneous characters, no robustness at all.

Are you completely sure there isn't a single page number left in your Gutenberg files? There's a drop cap in those Percy sonnets. What about the curly brackets that MS Word introduces? Have you taken any measures to correct them? Like, since >205 melannen: and >219 Petroglyph: and >276 Keeline: explained character encoding to you. You weren't aware of that issue when you did all the analyses you wrote your self-published books about. Are you sure not a single erroneous character isn't in there throwing your results all out of whack?

If you like, you can give me your corpus. I can run some Computational Linguistic tools on them that'll identify unorthodox characters.

"Your obsession with my use of copying and pasting is mesmerizing"

Because it is so cute and quaint and precious! Like watching somebody type an entire book with two fingers! I knew a guy once who was writing his MA thesis in public-private key encryption, and the way he typed capital letters was to hit the CAPS LOCK key, find the letter he wanted by hovering his index finger over the keyboard row by row, hit it, then hit CAPS LOCK again and continue typing with two fingers. It was baffling and adorable in equal measure.

"Computational-linguists are making the process of simply opening a folder and running the standard check sound like complex programming"

Look, I hate to break it to you again, but your comfort zone is like, your issue to deal with, you know? It's not the measuring stick by which to judge statistical methods employed in applied mathematics and computational linguistics and developmental biologies and a whole bunch of other fields that you do not have nearly enough experience in. Your massively un-streamlined tinkering (not even conditional formatting in your table, like, really??) is so far behind the standard tools of the trade that you don't seem to realize just how far behind you are. Or perhaps you do, and that's the source of so much of your hostility. But I will refrain from armchair-analyzing you.

"I could create a program that would execute the steps of my 27-test method and then it would be a software that would pop-out the answer instantly with a nice table like the one I included"

Our conceptions of "nice table" differ. But that's subjective. But yeah. Excel functions and macros are a thing, you know? And they can get you part of the way. Look it up. Acquire a new skill. It'll do you good.

"But as a woman without any connections; I am not going to get any funding to develop this program or to market it to researchers."

Ah, I can't speak to how women are treated in computational linguistics. If you frequent those circles, you'll know better than I do...

"researchers like you would consider my gender a disqualifier"

... but I draw the line at even more ugly smears about me. I know why you do it, but it's shit-flinging all the same. I do wish you wouldn't assume things about me.

"would refuse to use it even if it was free"

1) your method is already free.
2) we refuse to use it because it is based on naive misconceptions and because it completely obliterates any relative values in favour of 1/0. (and a whole 'nother bunch of reasons)
3) I will say that all of the software that we've talked about in this thread, R and Rstudio and Stylo and antconc are absolutely free. MS Word and MS Excel are not. Don't give your money to Microsoft. Support open source software instead!

"You have not presented evidence of Stylo or any other program being able to run the same 27 different tests my method performs."

Oh, but it can. Have a look at this reference manual in pdf. For things like "passive" you have to, of course, run your corpus through a morphosyntactic tagger first, and then write your own function to extract them. So those are a bit trickier.
Once again, it is not my job to explain everything about stylo to you until you are willing to accept my explanation. Look it up, doctor Faktorovich.

"In truth, my method multiplies the accuracy of Stylo by at least 26 times as Stylo's 1 test is turned into my 27 tests."
... I'm not sure I follow. Could someone who's more mathy than me explain? My feeling is that, since there's two possibilities here (either Faktorovich is right, or I am), so there's only a 50% chance of that being true. 🙄 I dunno, though.

"The point of an authorial-attribution method is that it has to lead to an accurate and defendable attribution claim"

Here's your LitCrittiness showing up again. It is not enough for a claim to be minimally "defendable". You're trying to base your method on stats here! Those do have a right-or-wrong answer if you misapply them!

"and my method delivers on both of these points as I have demonstrated"

You've demonstrated it to your own satisfaction only. I'll say this again: either you're a lone genius of brain-melting insight, doctor Faktorovich, or you don't know what you are talking about.

You still owe me that apology, Anna Faktorovich the liar. And the longer I have to wait for it, the more opportunities I'll have to call you a liar. Because you tell lies about me.

357MrAndrew
Dic 11, 2021, 1:16 am

Awesome. *munches popcorn*

358Petroglyph
Modificato: Dic 11, 2021, 1:28 am

>357 MrAndrew:
I'm nibbling on crisps and playing some bangin' tunes in the background. We must be up to, like four and a half article by now. Excellent!

359faktorovich
Dic 11, 2021, 1:40 am

>356 Petroglyph: You argument has now just deteriorated into completely nonsensical yelling as you are attempting to attack my credibility, even as you repeated that I have a PhD. I did not check if your profile includes your name or credentials. It is a good point that you might have no credentials at all and this might be the first time you are reading about computational-linguistics, as you are suggesting in your comments. I tested every step, concept and looked up every paper you have brought up in this argument; I have explained my objections to all of them already; these objections are weighty and prove the method you are using to be faulty. There is nothing new you are saying in your latest tirade. My directions refer to a "spreadsheet"; they do not specify that users have to use Excel; they can use any free spreadsheet software. You are saying that I would have to write a program that includes my 27-tests to adjust Stylo to my method. A better approach is for me to write a new program outside of Stylo for the 27-tests to avoid the glitches that Stylo inserts to corrupt texts it processes, as I explained earlier. All of the statements I have made in this discussion were the exact truth; the louder you deny them, the more it sounds like you keep protesting-too-much against having your method questioned, instead of presenting evidence to the contrary.

360Petroglyph
Dic 11, 2021, 2:28 am

>359 faktorovich:
Sure, whatever.

I'm afraid, though, that I cannot accept this as an apology for your repeated lies about me.

361MrAndrew
Dic 11, 2021, 2:42 am

>355 faktorovich:
"In truth, my method multiplies the accuracy of Stylo by at least 26 times as Stylo's 1 test is turned into my 27 tests."
>356 Petroglyph: "... I'm not sure I follow. Could someone who's more mathy than me explain?"

Happy to help. 1 times 26 = 27. Basic math. I failed an introductory math course at University twice, and even I know that.

362andyl
Dic 11, 2021, 4:48 am

>351 lilithcat:

It is as if faktorovich hasn't heard of weekly rep - you basically get a week between the first read-through and first performance.

363bnielsen
Dic 11, 2021, 7:17 am

>359 faktorovich: "Proof by Authority"?

Mathematicians have classified quite a few of those kinds of proof:
https://www.cantorsparadise.com/methods-of-proof-for-every-occasion-1d114c6fe628

BTW the Shakespeare play discussion brought into memory a very funny book I have:

Ben Ross Schneider: Travels in computerland : or, Incompatabilities and interfaces : a full and true account of the implementation of the _London stage_ information bank (1974)

It describes the process of turning the 8000 printed pages of "The London Stage" into a database in 1974. As the author says at one point: "I do not lead a life of quiet desperation; I lead a life of loud desperation."
Many of the problems of turning manuscripts into data that's been mentioned in our discussion here are also touched upon in Schneiders book.

>309 Petroglyph: made me wonder if I could detect the interpreter influence in a large corpus of translated texts by the same author. I was thinking about the over hundred Agatha Christie novels / short story collections translated into Danish by a smallish number of translators. Luckily I don't have the texts available in machine-readable format, so i'm using my lunch break to just eat lunch :-)

364lilithcat
Dic 11, 2021, 9:29 am

>355 faktorovich:

But as a woman without any connections; I am not going to get any funding to develop this program or to market it to researchers. And even if it was the most brilliant program in the market; researchers like you would consider my gender a disqualifier and would refuse to use it even if it was free.

Well, that lets you nicely off the hook, doesn't it?

Look, no one with any sense would deny that there is a gender discrimination in most professions, particularly in STEM fields. But to assume, without evidence, that any and all decisions to reject your work will, in the future be because of your gender hurts not only your credibility, but it hurts all women. Not every denial of funding to a woman or rejection of her work is due to her gender. Sometimes it's because the work doesn't deserve it. You're being very much like the boy who cried wolf.

(By the way, what makes you assume that Petroglyph is a man?)

365susanbooks
Modificato: Dic 11, 2021, 11:21 am

>356 Petroglyph: "Your PhD in English and LitCrit makes you an expert in zero other disciplines."

Don't be so generous. She's demonstrated that her knowledge of Literature & its criticism is also zero. Criticism doesn't just demand that a thesis be defendable, it demands that it be reasonably defendable. There is no support, literary, historical, or bibliographical, for the statements she's making. As I said in one of my first posts, if one of my first-year students came to me with work like this I'd be shocked at their utter inability to understand how logic works. The fact that this author is gleefully, willfully ignorant about the texts she "analyzes" doesn't make this any kind of lit crit at all.

I'm not sure what academic genre it would fall into. Maybe psych, if someone was looking at the ways her work & posts in this thread are absolute confirmations of the Dunning-Kruger effect.

366Petroglyph
Dic 11, 2021, 11:40 am

>365 susanbooks:
Well, what I overuse in generosity there I lack elsewhere, I guess.

I want to be careful when discussing credentials -- it's often a sore point and focusing on it would just lead the discussion down unhelpful pathways.

It's definitely Criticism (there's some weird corners of that field!). One former colleague of mine publishes regularly in Anthropoetics. Some of that is straight LitCrit. Some of it is... Well, I'll hold my tongue.

367Petroglyph
Modificato: Dic 12, 2021, 12:06 pm

AnnieMod, I hope you don't mind that I'm using one of your questions as a jumping-off point for another Lunch Break Experiment (tm), although this time it's more "Petroglyph reads a paper and skips the bits that go beyond them."

>354 AnnieMod: "punctuation comparison used to note similarities in style just does not sound like something that would work"

There's a lot of counterintuitive things that actually do give you promising and workable results. For instance, dividing a text into n-grams of n characters or n words. At the bottom of this page, they split a sentence into "3-grams": bunches of 3 characters. And when you use this to try and separate various authors, you get surprisingly accurate results! (I think I could retest the Wizard of Oz books like that and get pretty good results.) It's not a perfect method, by all means, but it delivers more than you'd expect. Worth looking further into, in other words.

But about punctuation. It matters of course exactly what you look at -- mere frequency, proportion of commas per period, sequences of punctuation marks, average number of words between two punctuation marks. I dimly remembered some papers and handbooks talking about it, so I checked my uni's library. And here, published this September, I found this paper:

Darmon, Alexandra N. M., Marya Bazzi, Sam D. Howison, and Mason A. Porter. 2021. ‘Pull out All the Stops: Textual Analysis via Punctuation Sequences’. European Journal of Applied Mathematics 32 (6): 1069–1105. https://doi.org/10.1017/S0956792520000157.

(If anyone wants a pdf, let me know).

So I tried to read it during lunch, and, while much of it went over my head (it was published in an Applied Mathematics journal; I'm a mere linguist), I can share some of the authors' comments and some screenshots of their results, since they talk about topics that have come up in this thread: punctuation as a means of assessing author style, the role of editors in punctuation, the role of online text editions. (I apologize ahead of time if I over-explain things that are obvious to more math-oriented ppl here, or if I make egregious errors. Do tell me if I'm wrong!)

From the abstract:

In our investigation, we examine a large corpus of documents from Project Gutenberg (a digital library with many possible editorial influences). We extract punctuation sequences from each document in our corpus and record the number of words that separate punctuation marks. Using such information about punctuation-usage patterns, we attempt both author and genre recognition, and we also examine the evolution of punctuation usage over time. Our efforts at author recognition are particularly successful. Among the features that we consider, the one that seems to carry the most explanatory power is an empirical approximation of the joint probability of the successive occurrence of two punctuation marks.

Emphasis mine. The bolded part means that they got better results in authorship attribution and genre allocation etc. when they looked at sequences of punctuation marks rather than individual ones.

They used "a corpus of 651 authors and 14,947 documents from the Project Gutenberg database (p. 1072), obtained via their API. Authors' birth dates range from the early 1500s through ~~the late 2000s~~ around 2000.

"We do not attempt to distinguish between an editor’s style and an author’s style for the documents in our corpus; doing so for a large corpus in an automated way is a daunting challenge, and we leave it for future efforts." (p. 1070)

So they do note that editors can have an influence on this particular metric, and that, in a perfect world, could be lifted out and considered separately. This is something they might work on next. They are aware that the problem exists, though, and take it into consideration. You'd need plenty of texts by different authors that have all been touched by the same editor. And you need that sort of corpus for many, many editors. Just the compilation of such a corpus is a labour-intensive operation.

The current study deals with this:

However, punctuation varies significantly across individuals, and there is no consensus on how it should be used (...); authors, editors and typesetters can sometimes get into emphatic disagreements about it.2 Accordingly, as a representational system, punctuation is not standardised, and it may never achieve standardisation. (...) It is plausible that an author’s use of punctuation is — consciously or unconsciously — at least partly indicative of an idiosyncratic style, and we seek to explore the extent to which this is the case.(p. 1072)

Now, this is a serious academic study, so Darmon et al. spend lots of time telling you exactly what they are trying to do, how they did it, and how they solved their methodological problems. So the next section may drag a bit.

Also, this is just an initial methodological study: Darmon et al. are trying to develop a workable way of checking punctuation measures in an automated fashion, if the corpus they've collected is appropriate, and, well, see if it works. This is a good basis on which to develop future iterations of this model.

I've bolded some bits in the next quote. This is a proof-of-concept type of study. Darmon et al. want to develop certain mathematical implementations of various ways to look at the patterns of punctuation. It's not the be-all and end-all of Stylometry.

We do not seek either to try to identify the best set of features or to conduct a thorough comparison of different machine-learning methods for a given stylometric task. Instead, our goal is to give punctuation, an unsung hero of style, some overdue credit through an initial quantitative study of punctuation-focused stylometry. To do this, we examine a small number of punctuation-related stylometric features and use this set of features to investigate questions in author recognition, genre recognition and stylochronometry. To reiterate an important point, we do not account for the effects of editorial changes on an author’s style, and it is important to interpret all of our findings with this caveat in mind (p. 1073)

These next two quotes are from the Data and Methodology section, where the authors tell you how they collected their data, and what choices they made in keeping/rejecting some texts. They also tell you what software they used.

We retain only documents that are written in English. (A document’s language is specified in metadata.) We remove the author labels ‘Various’, ‘Anonymous’ and ‘Unknown’. To try and mitigate, in an automated way, the issue of a document appearing more than once in our corpus (e.g., ‘Tales and Novels of J. de La Fontaine – Complete’, ‘The Third Part of King Henry the Sixth’, ‘Henry VI, Part 3’, ‘The Complete Works of William Shakespeare’ and ‘The History of Don Quixote, Volume 1, Complete’), we ensure that any given title appears only once, and we remove all documents with the word ‘complete’ in the title. Note that the word ‘anthology’ does not appear in any titles in our final corpus. We also adjust some instances in which a punctuation mark or a space appears incorrectly in Project Gutenberg’s raw data (specifically, instances in which a double quotation appears as unicode or the spacing between words and punctuation marks is missing), and we remove any documents in which double quotations do not appear. Among the remaining documents, we retain only authors who have written at least 10 documents in our corpus. For each of these documents, we remove headers using the Python function STRIP_HEADERS, which is available in Project Gutenberg’s Python package. This yields a data set with 651 authors and 14,947 documents (p. 1074)

Ooh, neat: there's a Python package specifically for handling Gutenberg texts. If I ever pick up Python, this is something I want to play with.

Here are the eleven punctuation marks Darmon et al. looked at: . , : ; ( ) ? ! " ... '
Again, they are careful to tell you what they included and what they excluded, and why. Openness and transparency!

For each document, we extract a sequence of the following 10 punctuation marks: the period ‘ . ’; the comma ‘ , ’; the colon ‘ : ’; the semicolon ‘ ; ’; the left parenthesis ‘ ( ’; the right parenthesis ‘ ) ’; the question mark ‘ ? ’; the exclamation mark ‘ ! ’; double quotation marks, ‘ “ ’ and ‘ ” ’ (which are not differentiated consistently in Project Gutenberg’s raw data); single quotation marks, ‘ ‘ ’ and ‘ ’ ’ (which are also not differentiated consistently in Project Gutenberg’s raw data), which we amalgamate with double quotation marks; and the ellipsis ‘ ... ’. To promote a language-independent approach to punctuation (e.g., apostrophes in French can arise as required parts of words), we do not include apostrophes in our analysis. We also do not include hyphens, en dashes or em dashes, as these are not differentiated consistently in Project Gutenberg’s raw data, and we find the choices between these marks in different documents — standard rules of language be damned — to be unreliable upon a visual inspection of some documents in our corpus. Lastly, we exclude square brackets (which are also sometimes called ‘brackets’), as they are used in metadata within the documents in Project Gutenberg. (p. 1075)

Here are the exact usage patterns they measured:

In other words:

f1: how often does each mark occur?
f2: If you get a punctuation mark X, how likely is it that Y will be the next mark (with or without intervening words)? (fill in any of the eleven for X, and any of the others for Y; Darmon et al. test all possible combinations)
f3: how likely is any XY pair (with or without intervening words)? (fill in any of the eleven for X, and any of the others for Y)
f4: average words per sentence. (where a sentence is taken to end with . ! ? ...)
f5: average number of words between two punctuation marks
f6: for a pair XY, how many words, on average, occur between them? (fill in any of the eleven for X, and any of the others for Y)

Faktorovich does F1 for these five punctuation marks , : ; ? !
She does F4, too. The others are not part of her method (because the online tools aren't equipped for them. In fact, I wonder how they define "end of sentence").

Ok. I'll skip over most of the actual methodology, in which they mathematically operationalize these 6 ways of looking into because I must admit that the mathematics are way beyond me:

And this is just the beginning. They also employ Kullback-Leibler divergence, Shannon entropy, and others. I don't know what any of these are. There's a great deal of talk about how they use neural networks, too. Interesting, I'm sure, but I'm lost here.

Again, if you want the pdf to check for yourself, let me know.

Anyway: here are their results for how their measures f1 through f6 correctly separate authors (open in separate tab for larger version):

The percentages in each of the f-columns are how many texts this test correctly assigned to the correct author. Some measures start at 80-90% correct separation of 10 authors; one starts at around two thirds. As more and more authors and texts are included the results become less reliable. Darmon et al. also show what results you get if you test f1, f3, f4 and f5 together, and if you take all 6 together. Better results here than for each of the features taken separately.

To be clear: f1, f2, f3, f5 and f6 each run on all eleven punctuation marks simultaneously. To translate this in Faktorovich units: her 5 punctuation marks and sentence length (which is f4 here) count as 6 of her 27 tests; so f1 counts as 11 tests, and then f2, f3, f5 and f6 count as whatever the number is for any possible pairing between 11 items (each!).

Darmon et al. also look at whether their f1 through f6 can be used to tell various genres apart. Here are the results for that set of experiments (open in separate tab for larger version):

Not great. Not terrible, either, but not all that great. The highest score they have is 65%, for measure f3.

Finally, Darmon et al. also look at changes in punctuation over time (I've hyperlinked their Figures 13 and 14):

We show the evolution of punctuation marks over time for these 616 authors in Figure 13 and Figure 14, and we examine the punctuation usage of specific authors over time in Figure 15. Based on our experiments, it appears from Figure 13 that the use of quotation marks and periods has increased over time (at least in our corpus), but that the use of commas has decreased over time. Less noticeably, the use of semicolons has also decreased over time. In Figure 14, we observe that the punctuation rate (which is given by formula (2.6)) tends to decrease over time in our corpus. However, this observation requires further statistical testing, especially given the large variance in Figure 14. Because of our relatively small number of documents per author and the uneven distribution of documents in time, our experiments in Figure 15 give only preliminary insights into the temporal evolution of punctuation, which merits a thorough analysis with a much larger (and more appropriately sampled) corpus. Nevertheless, our case study illustrates the potential for studying the temporal evolution of punctuation styles of authors, genres and literature (and other text). (pp. 1092-3)

Darmon et al. note that these trends over time have to be taken into account for future studies. If the use of commas and semicolons indeed trends downwards over time, and quotation marks go up, then you can't use the exact same calibration for texts from the 1500s as texts from the 1900s or the 2000s.

Ok, so what do Darmon et al. have to say about their results in general? In their own words:

One feature, which we denoted by f3 , measures the frequency of successive punctuation marks (and thereby partly accounts for the order in which punctuation marks occur). Among the features that we studied, it revealed the most information about punctuation style across all of our experiments. It is worth noting that, unlike the feature f 2 , which also accounts for the order of punctuation marks, f 3 gives less weight to rare events and more weight to frequent events (see equation (2.3)). This characteristic of f 3 , in concert with the fact that it accounts partly for the order of punctuation marks, may explain some of its success in our experiments.

They had most success with f3 ("how likely is any pair of punctuation marks?"): looking at the order in which an author uses punctuation marks seems to be more revealing of their style than taking measures based on how frequently she uses them. (though the latter also gives ok-ish results. But sequences appear to do better.)

They also add this:

It would be interesting (although daunting and computationally challenging for someone to do it with Project Gutenberg) to try to gauge whether and how much different editors affect authorial style. (p. 1095)

and this:

"It is also worth reiterating that Project Gutenberg has limitations with the cleanliness of its data. (See our discussion in Section 2.1 for examples of such issues.) These issues may be inherited from the e-books themselves, and they can also arise from how the documents were entered into Project Gutenberg. Although we extensively cleaned the data from Project Gutenberg to ameliorate some of its limitations, important future work is comparing documents that one extracts from Project Gutenberg with the same documents from other data sources." (p. 1095)

In their conclusions, Darmon et al. mention that further developments of these methods could be applied to different translations of the same work, or of different editions of the same text, as well as trying to tease out "the effects of an editor’s or journal’s style on documents by a given author (an especially relevant study, in light of the potential to confound such contributions in corpuses like Project Gutenberg)" (p. 1096)

Ok. Here ends today's Lunchtime Reading Hour with Petroglyph (tm). And here, at the endpoint of my ramblings, I can't resist pointing out this paper:

Whissell, C. (1996) Traditional and emotional stylometric analysis of the songs of Beatles Paul McCartney and John Lennon. Comput. Human. 30, 257–265.

This looks like fun!

368Petroglyph
Dic 11, 2021, 12:28 pm

>364 lilithcat: (By the way, what makes you assume that Petroglyph is a man?)

My writing style is probably more similar to that of male LT commenters than to that of female LT commenters.

369susanbooks
Dic 11, 2021, 1:04 pm

>368 Petroglyph: And, as Barbie says, Math is hard! -- therefore, by the sort of proof displayed here, I say you're a ghostwritten man.

370AnnieMod
Dic 11, 2021, 1:46 pm

>367 Petroglyph: The one time in my life when a sentence I write is taken kinda out of context and I don’t mind (because then the explanation proves the context). I love math on Saturday morning. :) What a surprise - editors are important when you study distribution of the punctuation. :)

PS: I’d love that PDF - not because I want to check your work but because it made me curious.

371drneutron
Dic 11, 2021, 2:29 pm

>369 susanbooks:, >370 AnnieMod: Yeah, I'm interested too. Since it was published in Applied Math, I was pretty sure I could find a preprint on arXiv.org, a massive preprint database for physics/math/comp sci/etc. Sure enough, here it is!

https://arxiv.org/abs/1901.00519

And thanks to Petroglyph for starting me down a fascinating rabbit hole - long time R user, but never for this sort of thing...

372Petroglyph
Dic 11, 2021, 3:05 pm

>370 AnnieMod:
Here's the pdf!

373bnielsen
Dic 11, 2021, 3:09 pm

>367 Petroglyph: Markov chains are great fun (and part of a rather large statistics course that I was required to take). The basic idea is that you have a number of states (or nodes in a graph or whatever) and probabilities for going from one state to another.
So the matrix they generate are just these probabilities.

I've used Markov chains to generate random programs for testing a compiler (not Python though).

People have also used it to generate random text with the same patterns as say French.

Thanks for sharing the study.

374Petroglyph
Dic 11, 2021, 3:11 pm

>369 susanbooks:

Who's to tell who my ghostwriter is? I could be a woman born in the 1930s, who went on to be one of the original computers in the 1940s and 50s (back when tedious data input was done by women). I wonder how many kilogirls this thread is worth by now ;)

375Petroglyph
Dic 11, 2021, 3:15 pm

>373 bnielsen:
Thanks for the explanation. I can see the, well, what I call the general shape and structure of how that works, but I'd have to spend some actual thinking time with a tool/book/person to feel semi-confident in talking about them.

376bnielsen
Dic 11, 2021, 3:25 pm

>374 Petroglyph: I'm guessing it must be even older than that. A petroglyph must surely be from the stone age, so maybe all text is written by a single ghostwriter? (I think we are heading towards solipsism-by-proxy.)

Darmon-et-all style: .,?(.)

377Crypto-Willobie
Dic 11, 2021, 3:40 pm

Life's too short...

378Petroglyph
Dic 11, 2021, 3:42 pm

>376 bnielsen:
I wonder who we could name that condition after...

379faktorovich
Dic 11, 2021, 4:18 pm

>367 Petroglyph: The frequency of different punctuation marks is a highly precise method of authorial-attribution. This is why I used it as a significant component in my method, so this article’s main experiment re-enforces the strength of my method. The tests applied in this article are not relevant for the “Lunch” experiment Petroglyph has included because he tested for word-frequency and not for punctuation frequency. The most frequent letters and words are also very accurate (based on the hundreds of texts I have tested), but splitting all letters into non-computable 3-letter and 3-word etc. clusters is not at all relevant or rational for authorship-assignment.

"mere frequency, proportion of commas per period, sequences of punctuation marks, average number of words between two punctuation marks." Imagine if you were attempting to solve a murder. The "mere frequency" test is equivalent to merely testing the DNA left at the scene and its frequencies or markers against suspects; the alternative "portion of commas per period" would be like testing the DNA against the finger-prints - it is a good idea to do both, but why compare them to each other if both provides accurate results; and this method is not only asking to compare them put to establish the quantity of DNA left vs. the size of the finger-prints left; making this comparison of oranges to apples is nonsensical because it digresses away from the murder mystery at-hand into playing with math equations to give more work-hours to the technician/ researcher on the case. For example, "average number of words between two punctuation marks" is already calculated in my tests by the separate tests for the average number of words per sentence, and the frequency of the various significant punctuation marks; these elements are already combined when I incorporate these 27 different tests into the final results, so also measuring the number of words between the marks adds a test to the analysis that repeats the findings of several of the already accounted-for tests. The term the article uses for this test-doubling is " joint probability" - this is a statistics/investment term that means checking the probability of 2 events happing together etc. This measure is significant in finance to estimate the behavior of stocks based on 2 different events happening that might influence their direction. In linguistics, the texts being tested have already been written and there is nothing unknown about what words/ punctuation they might include. Therefore, the testing of elements like punctuation and words-per-sentence or words-per-paragraph are more accurate when they are performed separately; and recording the data of these tests separately helps researchers to figure out if specific punctuation marks or specific sentence/paragraph length contributed most significantly to a given attribution; this data helps to figure out the impact each element has in relation to genre/ period etc.; when these tests are combined into a formula and the separate measurements or data-points are not provided, the researcher does a disservice to readers interested in further exploring the results.

"We remove the author labels ‘Various’, ‘Anonymous’ and ‘Unknown’." This step can be used to remove half of the Renaissance texts because over half of them were initially anonymous. So this elimination can be a serious bias that can skew the results of this study, as it would not recognize similarities between "Shakespeare" and otherwise bylined texts from these decades and these anonymous etc. texts, some of which were significantly first-published after "Shakespeare's" death.

Another problem, "we remove any documents in which double quotations do not appear": this can be used to exclude all old-spelling versions of texts in favor of only heavily edited modern editions that have been standardized with double quotation marks. As I explain this becomes essential for attribution in the Renaissance where the old-spelling/old-punctuation version of the anonymous "King Leir" does not match the modern-spelling version of the "Shakespeare"-bylined "King Lear", but does match the old-spelling "Shakespeare"-bylined version of "King Lear". The heavy translation process that has been applied to modernize Renaissance texts makes them incomparable to originals, and if different editors were involved not-comparative to each other in these modern versions either.

Yet another problem, "we retain only authors who have written at least 10 documents in our corpus"; as we have discussed in this chat before, the texts by authorial-bylines with under 10 documents to their names are the most likely to have been the work of ghostwriters, so establishing if these match texts with other bylines should be the primary concern of researchers attempting to test an enormous corpus for attribution. The exclusion of these under-10 text writers again skews the results towards re-affirming the bylines of the prolific writers without recognizing similarities between their styles and possibly some of these excluded minor "writers".

I already pointed out the basic unnecessity of testing punctuation in combination with word count and in random combinations with other punctuation marks. My method achieves the same combination of these different measures that is beneficial for attribution, without each sub-measure being duplicated in several formulas that measure convoluted combinations before being averaged out in the final attribution. Only somebody that deliberately wants to prove attribution with computational-linguistics is impossible would calculate the odds of every occurrence of punctuation mark X after a Y punctuation mark. The simple frequency of X and Y established the authorial-style, whereas these combinations can generate wildly random results, and especially so if all of these different combination possibilities are combined into a single attribution formula. I do not use F 2-3 and F 5-6 in my method not because I lack the tools to apply them, but because there is no added level of attribution certainty to performing these mathematical magic tricks that are only for show, and not for substance. For example, the “number of words between two punctuation marks” is the same as measuring the length of sentences and the frequency of commas; the tests of comma and period frequency give the same basic measure of words between these punctuation marks.

You have cut off the portion of the article where elements of the main equation for punctuation-marks-matrix is defined. If you have read math-papers before, surely you would recognize that including the final product of a formula, and not its components, makes it impossible to criticize the errors in the steps. I do not have access to the full-paper, so I cannot fully explain what is wrong with this final formula. And not knowing terms like “Kullback-Leibler divergence” is easily solved by looking them up = it is basically a comparative analysis of statistical distribution. “Shannon entropy” refers to uncertainty vs. certainty of the output. Neither of these concepts are needed to attribute authorship, as my simple method proves. Uncertainty and the full range of distribution become problems when the mathematical equations applied are unnecessarily convoluted so that they themselves introduce uncertainty or chaos instead of just solving the attribution mysteries they should be designed to solve. If you shared the pdf with me, and I explained the precise mathematic errors in their approach; you would clearly not pay attention to my explanation, as you are skimming through their explanation, and they are researchers who you respect.

Their “Accuracy for the testing set” measures the percentage of cases where the byline attributions in the texts matched their authorial-signature assignment. It indicates that they achieved the highest degree of accuracy on test f3 when only testing 10 different bylines, f1 was second-best, and f5 third, and f4 fourth. Yet they are suggesting combined these texts together into an average leads to an overall-accuracy that is close to the top of the four f3 (93%) vs overall 89%, when f4 was only 64%; a simple average of the 4 measures equals 82%, so they have tinkered even with this average measure. These findings also mean that one of their more convoluted tests for punctuation pairs was the most accurate test. While one of the tests I have found to be especially reliable (words-per-sentence) is the least accurate test. Of course the problem with the words-per-sentence test alone is that different authors can have similar words-per-sentence frequencies, but they can easily be distinguished on other measures included in my 27-tests method; this is why I do not use any one of these tests. This article’s method also proved to disintegrate when the corpus was expanded to 100 authors (my corpus for the Renaissance had 104 authors) – in this corpus, 3 out of 4 of the measures were at 50/50 chance of accuracy or lower, with one as low as 37%; but the combined f1, f3, f4, and f5 average remained at an irrationally-high level of 79%. The bigger problem is that accuracy of attribution might have been increasing as the number of authors grew, while only the byline accuracy was decreasing; in other words, the giant corpus might have finally allowed the researchers to accurately determine which of the texts shared linguistic signatures or had single underlying ghostwriters, but these matches are dismissed as inaccuracy because all divergences from the stated bylines are judged to be errors.

On the genre-experiment, their result was that computational-linguistics tests they applied cannot be used to distinguish between genres. This is exactly correct. Any accurate attribution method should not impacted by the genre used in the texts. Genre should be considered by literature scholars to understand how it impacted the text’s data, but authorial-attribution works outside of genre-lines. My method found multi-genre texts that matched a single authorial-signature for all of the Renaissance ghostwriters.

The graphs do not make any rational sense. Graph c has frequencies between 0 and 9%, d up to 4%, but a and b stretch between 0 and 60%. Why would the punctuation rates be 10-times higher in a and b vs. c and d for the same centuries? And this graph does not have a definition for the different line colors, so it is unclear if exclamations or semicolons usage is going up or down.

Overall this article is a great example of misleading, deliberately convoluted, and obviously erroneous methodology that is the reason this field needs to be fixed by instead applying my accurate attribution method. The fact that data has been manipulated is apparent from the broad findings. They do not present the raw data and the data for the steps of their calculations or the precise f1-6 data points before these were incorporated into these seemingly simple summary-tables and visuals. The raw data is not likely to be anywhere as neat and orderly, and I would assume that if I examine the exact calculations involved in each step of this process, I would find the specific errors and apparent data manipulation involved that has led to these results that seem to represent a simple relationship between corpus-size and byline re-affirmation accuracy. My own testing contradicts this result as many bylines share linguistic-signatures when a ghostwriter is involved etc. I could create several colorful graphs to explain my method, and could phrase my method in terms of formulas, but this would do nothing but confuse the public who might just need the basic steps to reproduce or test my results.

380Petroglyph
Dic 11, 2021, 4:42 pm

>379 faktorovich:
"average number of words between two punctuation marks" is already calculated in my tests by the separate tests for the average number of words per sentence, and the frequency of the various significant punctuation marks; these elements are already combined when I incorporate these 27 different tests into the final results, so also measuring the number of words between the marks adds a test to the analysis that repeats the findings of several of the already accounted-for tests"

This comment, Ladies and Gentlemen, makes me giddy.
Faktorovich tests the absolute frequencies of five punctuation marks , : ; ? !. Let's forget that Darmon et al. do 11, and just entertain this for a second. Sure. Darmon et al test at least five, too.
A separate test that her online tools perform is average sentence length. This, Darmon et al. do, too.
Faktorovich imagines that, merely by testing these things separately and independently, and then by obliterating their independent results into a binary oppositon (for ease of counting), that she also has tested their interaction.

Holy fuck, you guys.

No wonder she feels she's gone beyond the laws of probability. (>305 faktorovich: "The more I apply computational-linguistic tests to texts and research these authorship mysteries, the more convinced I am that there are no coincidental matches; all matches indicate shared authorship.")

At this point I feel it's necessary to ask: are we sure that the person posting under this moniker really is Dr. Anna Faktorovich? Are we really sure it's not some troll going around posting mindblowing nonsense under her name to make her look bad?

Holy Dunning-Kruger, Batman!

Doktor Faktorovich, I stand in awe at the depths of your misunderstanding. I must thank you for bringing this unexpected jolt of mirth into my life.

I'm sure I'll have more to say later -- I haven't made it further than that quote, tbh. Truly, I stand in awe, and I wanted to take a minute to live that moment to the fullest.

381igorken
Dic 11, 2021, 5:28 pm

I wonder what the result would be when comparing the posts in this thread by faktorovich and Petroglyph using faktorovich's method.

382lilithcat
Dic 11, 2021, 5:33 pm

>381 igorken:

I think we'd find that they were all actually written by Tim Spalding.

383Petroglyph
Dic 11, 2021, 5:51 pm

So this what becomes of Matilda and other children that Roald Dahl writes about: they grow up into adults with the doggedness and the sheer mental force to reject reality and substitute their own. I genuinely mean that as a compliment, by the way, no joke, no tongue in cheek: I love Matilda.

Do you read fiction at all, Dr Faktorovich? If so, you might enjoy Angel by Elizabeth Taylor, or even Hadrian the Seventh by Frederick Rolfe. Both these books are about people like yourself: strong-willed characters who believe so strongly in their own version of reality that the World At Large has no choice but to yield. (I've written reviews of both of these.) Or if you're more into biographies, I can also recommend The Quest for Corvo: an experiment in biography by A. J. A. Symons -- a biography of Frederick Rolfe that is... *chef's kiss* ... absolutely riveting. (My review here.)

I've now also sent you a DM asking for a pdf copy of your book The Re-Attribution of the British Renaissance Corpus. I'll be happy to read it over the end-of-year break.

384Petroglyph
Dic 11, 2021, 5:51 pm

>382 lilithcat:
So *that* is why changes to the site always take "two weeks"!

385faktorovich
Dic 11, 2021, 8:14 pm

>383 Petroglyph: I just started putting together the 15 book reviews I am writing for my Pennsylvania Literary Journal. One of these is fiction. I requested all except for the fiction title from these publishers. I tend to only read fiction when I am researching it for an article/book, or reviewing it. In the last few years (before a downturn in the number of review copies publishers send) I have been reviewing around 150 book titles per year for PLJ. And I read thousands of books annually of all types as I work on my research. Given that research has been my full-time job for most of the last 10 years, the line between "fun" reading and professional reading has blurred. I wrote a book about formulaic writing, "Formulas of Popular Fiction", where I explain my views on most pop genres/ pop books. I do not escape into books to find myself, but rather read books to find the underlying truth about their authors. There is nothing fantastical about my Re-Attribution series. It might seem fantastical if you read my summaries of its volumes and the re-attribution conclusions, but as you should find out as you review the series itself, it presents overwhelming concrete evidence for the version of history my re-attributions indicate.

386Petroglyph
Dic 11, 2021, 9:02 pm

>385 faktorovich:
That's ok. I don't judge people for their reading habits.

Also, I'm not suggesting that your methods are fantastical or fictional. I'm saying that they're nonsense.

387Petroglyph
Dic 11, 2021, 9:09 pm

>379 faktorovich:
The most frequent letters and words are also very accurate (based on the hundreds of texts I have tested), but splitting all letters into non-computable 3-letter and 3-word etc. clusters is not at all relevant or rational for authorship-assignment."

Ok, I'm sorry, everybody, but I've got to explain how completely misguided this is.

Let's take that example of a character ngram where n=3 from this page:

“Happy families are all alike; every unhappy family is unhappy in its own way.”

Split this into strings of three characters, or trigrams, and you get this (a space is also a character):

"h a p" "a p p" "p p y" "p y " "y f" " f a" "f a m" "a m i" "m i l" "i l i" "l i e" "i e s" "e s " "s a" " a r" "a r e" "r e " "e a" " a l" "a l l" "l l " "l a" " a l" "a l i" "l i k" "i k e" "k e " "e e" " e v" "e v e" "v e r" "e r y" "r y " "y u" " u n" "u n h" "n h a" "h a p" "a p p" "p p y" "p y " "y f" " f a" "f a m" "a m i" "m i l" "i l y" "l y " "y i" " i s" "i s " "s u" " u n" "u n h" "n h a" "h a p" "a p p" "p p y" "p y " "y i" " i n" "i n " "n i" " i t" "i t s" "t s " "s o" " o w" "o w n" "w n " "n w" " w a" "w a y"

It has been demonstrated that, just like relative word frequencies can help in distinguishing two authors' bodies of work, trigrams (n-grams of length 3) can do so, too. You get pretty good results. Not perfect, but pretty good! And I think I that I can also convincingly argue that trigram frequencies can, indeed, correctly attribute an author's work to them.

First, I'll show you. Here is a cluster graph for the Oz novels in >309 Petroglyph:, generated using character n-grams of length 3 (with the pronouns left in):

All of Baum's novels cluster together, all of Thompson's novels cluster together, and the mystery book suspected to be authored by Thompson (which was confirmed via several stylometric analyses) has also been judged to be extremely similar to the Thompson books. Just like in >309 Petroglyph:, where word frequencies were used.

So, empirically, character trigrams and single-word frequencies give similar results. (Anyone arguing that this method is faulty must also find a reason for why they're faulty in exactly the same direction.)

Second, I'll explain the theory behind why this works.

Alright. An "n-gram" is simply a "gram" -- a connected string of characters -- of length n.

We humans tend to think in units of meaning, and for English those include (but are not limited to) words, phrases, sentences, chapters, novels, trilogies, ... But computers don't think like that. To them, a character, a word, a phrase, a sentence, a paragraph, a chapter, a novel, a trilogy, a corpus ... are just strings of characters of different lengths. They're the same category of thing, just of different lengths. And you can totally use strings of different lengths to distinguish two authors from each other. Even Faktorovich agrees with that. Here are some examples:

If your method looks at the frequency of three-word phrases, it looks at the frequencies of word n-grams of length 3. These can be useful in authorship attribution
If your method looks at single word frequency, it looks at word n-grams of length 1. These, too, can be useful in authorship attribution.
If your method looks at single character frequencies, it looks at character n-grams of length 1. These, too can be useful in assigning authorship
A character n-gram of length three is not some totally different category from single characters and words and phrases. It is merely the logical observation that, given the previous bullet points, strings of characters that are longer than 1 but shorter than the average word can also be useful in authorship attribution

A machine doesn't care if it is told to chop the long long string of characters that is a novel into various pieces of uneven length at the spaces and punctuation marks (~ words), or into even pieces of three characters (trigrams), or into even pieces of single characters (character n-grams of length 1, aka single letters & punctuation). It just does what it's told.

The reason that relative frequencies of character trigrams can be useful in assigning authorship is exactly the same reason why individual word frequencies can. This would be obvious to someone who's been doing stylometry research for years. Right? Right?

Dr Faktorovich. If you really, truly, genuinely believe that looking at n-grams is "non-computable" and "not at all relevant or rational for authorship-assignment", then I recommend you read The Re-Attribution of the British Renaissance Corpus, by Dr. Anna Faktorovich. That book contains Volumes 1-2 of her Re-Attribution Series, and in its densely argued 698 pages, she argues for this infallible and extremely sophisticated computational-linguistic method that she's invented. This method makes extensive use of word n-grams of length 3, word n-grams of length 1, and character n-grams of length 1.

388faktorovich
Dic 11, 2021, 9:39 pm

>387 Petroglyph: If you split all of the letters into a text into trigrams, according to your example, you would be multiplying the quantity of letters in the test-sample by at least 6 and each of these X6 portions would be measuring basically the same letters but in different combinations. Retesting the same letters six times in different combinations is the definition of nonsensical research. My method demonstrates that just testing the most frequent letters and words is the rational approach. Your diagram can be a creative art project without any data to support it. A diagram that shows "correct" or byline-affirming attributions does not prove it is correct; the errors I have been explaining are not in your or other researchers' in this field capacity to draw diagrams that make them seem believable, but rather in your erroneous method and the lack of disclosed underlying raw data.

Your n-grams for 3-letters example is not for "different length". If they were "different length" this would just break the rules of the experiment of test 3-letter n-grams, but it would do nothing to establish attribution. Measuring the most frequent units (words, letters, phrases) is most useful for attribution. The 3-word phrases are indeed 1 example of n-grams or strings that I use, but I do not compare these phrases quantitatively, but rather use them to verify if the quantitative tests match any pattern that emerges between these phrases. I am not arguing against the testing of all strings of more than one unit, just again any method that repeats the testing of the same units with the same tests by creating multiple nonsensical combinations of these units. The term n-grams can signify a rainbow of different things, so it is best to avoid it and instead to state if what is being compared is words, letters, phrases etc. to be specific in a research article; instead scholars in this field use the term n-gram especially when they do not want to reveal that they are only testing for word-frequency, as saying n-gram instead can lead a reader to assume something more "advanced" is actually being tested.

If as you say "relative frequencies of character trigrams" are "exactly" as useful as "individual word frequencies"; then, you have lost any reason to object to my method. You have been saying that the only real error in my method is that it does not use these "relative frequencies of trigrams"; so if my method is just as statistically applicable, it is equally as correct according to this principle. The problem is that the linguistic paper we are now discussing (and others like it) use the complexities of dividing texts into many strange parts, and various other tricks to confuse casual readers so they do not notice that the data to support the conclusions or the pretty final diagrams is missing, or the provided data are summaries statistics and do not show where this data came from or how it was processed to get the final summary-results.

389amanda4242
Dic 11, 2021, 9:46 pm

>383 Petroglyph: I'm in the middle of Angel, and yes, the resemblance is striking.

390Petroglyph
Modificato: Dic 11, 2021, 10:33 pm

>388 faktorovich:
"you would be multiplying the quantity of letters in the test-sample by at least 6 and each of these X6 portions would be measuring basically the same letters but in different combinations. "

... Multiplying? I don't split letters into trigrams. I just tell the software to look at strings of three characters. The characters that are there in the text. No new characters are added, or multiplied, or counted more than once. Where do you get these absolutely unfounded ideas? Have you ever thought of starting a cult?

"testing the most frequent letters and words is the rational approach"
Relative frequencies. Normalized against averages. But yeah.

"Your diagram can be a creative art project without any data to support it"
It can also be just a reflection of an honest data analysis. You just don't trust anything that's too far outside your comfort zone. We've been here before.

"Your n-grams for 3-letters example is not for "different length". "
Correct. I said "a character, a word, a phrase, a sentence, a paragraph, a chapter, a novel, a trilogy, a corpus ... are just strings of characters of different lengths. They're the same category of thing, just of different lengths". A word is a string of characters; so is a phrase; so is a sentence; so is a paragraph; and so on. They're all character strings, but a word does not have the same length as a paragraph, a novel, a trilogy...

Really, you're misreading me at such a basic level that my troll-senses are tingling.

"The 3-word phrases are indeed 1 example of n-grams or strings that I use, but I do not compare these phrases quantitatively, but rather use them to verify if the quantitative tests match any pattern that emerges between these phrases."

Lady, you verified Jack Shit.

"relative frequencies of character trigrams" are "exactly" as useful as "individual word frequencies"
Relative word frequencies, yes. I have not changed my position at all.

Look. You keep misreading sentences and misinterpreting them in the most literalist of ways. Is this you trolling us, or is this you being neurodivergent?

"the complexities of dividing texts into many strange parts"
... punctuation is strange parts now?

"various other tricks to confuse casual readers so they do not notice that the data to support the conclusions or the pretty final diagrams is missing, or the provided data are summaries statistics and do not show where this data came from or how it was processed to get the final summary-results"

Your problem, throughout, seems to be that scholarly literature written for experts is hard (or even impossible) to comprehend by non-specialists. That is not a bug, lady, that is a feature. There is value in being able to have conversations at various levels of complexity. Look at Judith Butler's writing. Non-LitCritters bounce off stuff like that. Car mechanics can say things about engines that non-mechanical people are unable to follow. Every profession has this. Maths and linguistics and theology and archaeology and evo-devo and even tree surgeons.

If mathematicians writing to each other had to dumb down their stuff so that non-mathy people could clearly follow everything, there'd be no scholarly literature for mathematics. The same is true for any other advanced field of knowledge -- including car mechanics, electricity, dam construction, cement mixing, glass blowing, sex workers, media criticism and even fanfic writers. With respect to that last one: Are you aware that Drapple is a cargo ship?

This is not done to intentionally shut out non-specialists. In fact, for stats and maths and LitCrit and so many others there is a whole university system dedicated to training interested laypeople into becoming specialists so that they can have advanced conversations with fellow-specialists. If the goal were to shut people out, there'd be no BA, MA or PhD programmes! There's masses of educational materials out there, all free of charge, and of exquisite pedagogial levels.

You claim to have been active in that universisty system at multiple institutions and in more than one country. Are you part of the Great Conspiracy to keep the masses away from arcane knowledge and confuse them?

Look at it from the other direction. This discussion we've been having would be incomprehensible to a seven-year-old. The books you've authored would be. They wouldn't be able to read or understand it, and they haven't got the attention span to keep up. Not every discussion among adults should be held at the level of a 7-year old. Similarly, not all discussions about monte carlo markov chains should take into account the people who don't know what that is.

If you had your way, discussions about any advanced subject would be impossible because of all the sealioning.

Learn what that word means, Faktorovich. I'm going to use it again in referring to you.

You have this fundamental problem with not understanding basic maths, with scholarly literature that you do not understand, and it bothers you so much that you jump to conspirational conclusions.

Neurotypical people don't do this. Neurotypical people understand that different levels of complexity are appropriate (and desirable) in various contexts.

Sealioning trolls pretend not to understand. Several groups of neurodivergent people have trouble understanding this through no fault of their own.

I repeat: either you're a sealioning troll, or you're neurodivergent. Which is it?

391Petroglyph
Dic 11, 2021, 10:22 pm

By the way, faktorovich. You still owe me an apology for all those lies your lying self told about me.

392Petroglyph
Dic 11, 2021, 10:30 pm

>389 amanda4242:
Are you enjoying the book?

393AnnieMod
Dic 11, 2021, 10:37 pm

>371 drneutron: Thanks! I totally forgot that this is there. :)

>372 Petroglyph: Thanks!

394amanda4242
Dic 11, 2021, 10:39 pm

>392 Petroglyph: Very much, although I wouldn't say it's particularly pleasant reading.

395faktorovich
Dic 12, 2021, 12:04 am

Questo messaggio è stato segnalato da più utenti e non è quindi più visualizzato (mostra)

>390 Petroglyph: Your conclusion: "scholarly literature written for experts is hard (or even impossible) to comprehend by non-specialists." The computational-linguists you have cited assume this is the case. This is why they leave blatant errors in their basic math, assuming that all readers will not review the details just because they use terms like n-grams that are foreign to an average reader. This assumption that no "non-specialist" or nobody outside of the group of collaborating computational-linguists will be able to or would want to read their research is the reason these mistakes are left without even basic attempts to disguise them. The point of these articles is discrediting computational-linguistic attribution, and thus any errors left, if they are discovered, can help these same researchers to later argue that these errors mean this group of attribution methods never works and is fraudulent. In contrast, my method and my data is an open-book and simple for all to grasp because my goal is to create an attribution method that is free and accessible to all in the public, and not to bar readers from understanding it, or to disguise that it is mishandling data. If the goal of these researchers is similar to your goal of finishing an analysis in a "Lunch"-break, or as quickly as possible; just plugging in the data into software without checking if it introduces errors, or if the formula it uses is nonsensical is too much of a burden.

You are using a "sealioning" strategy, as you are trolling me by repeatedly asking me for additional evidence to support my methodology. However you view what you are trying to do, the discussion of the errors in the standard methodology in computational-linguistics is precisely what I would like to discuss in this forum. The result so far is an overwhelming quantity of free information I have provided here that any reasonable reader would be convinced by. These reasonable readers are not likely to join this discussion themselves because based on what you and your team of messengers have been saying so far would lead them to assume that you would immediately personally attack them if they took my side. This is why I have not invited anybody I know or my affiliates etc. to join this discussion, so that nobody else has to deal with being attacked. The strategy of "sealioning" would only work if the person being questioned for more evidence does not have more evidence to provide, but since I have been researching this subject for years, the well of other points I can raise is pretty inexhaustible, as you might eventually find out.

Your statements about me being "neurodivergent" are clear violations of the no-personal-insults policy in LibraryThing since this term means "differing in mental or neurological function from what is considered typical or normal (frequently used with reference to autistic spectrum disorders)", i.e., you are calling me "autistic". There is no more direct form of insult out there than this, but I am not flagging your post because it would disappear-under-flags, and I think it will help readers if seen to understand that you are just a bully here calling me bad-names.

396spiphany
Dic 12, 2021, 6:02 am

>395 faktorovich:
Asking if someone is neurodivergent is only an insult if the asker is using it to imply that the person in question is somehow worse or lesser than people who are neurotypical.

In itself, the term is a neutral descriptor used in professional circles to describe the fact that some people's brains are wired in ways that are distinct from the majority of the population, and that this often creates difficulties for these people in everyday life (e.g., trouble understanding things like tone). I think some people prefer "neurodiverse" or "neuroatypical" to avoid the implication that their diagnosis is a disability, but "neurodivergent" isn't an insult.

I read Petroglyph's suggestion as an attempt to understand why this conversation has developed the way it has. In other words, different brain wiring might be one possible explanation for why many things that seem obvious to Petroglyph aren't obvious to you, and why you seem to have such difficulty communicating with other members in this thread.

For my part, in many places in this thread, I have (for whatever reason) found it very difficult to follow your logic or reasoning. It doesn't correlate with my knowledge of the world, of writing and publishing, or of good scholarly argumentation. For what it's worth, I am trained in literary criticism, with some background in classical philology (i.e., the basics of textual criticism), philosophy, and linguistics. I'm not a mathematician. My work involves improving the English of academic texts written by non-native speakers in fields I don't have training in (mostly social sciences, but sometimes law, biology, energy technology, etc.). I'm pretty good at following the general shape of the argumentation of specialist texts even when I don't understand the technical aspects enough to judge the accuracy of their math or other details. The published articles that others have cited and linked to make sense to me in this general way. The comments of others in this thread make sense to me in a general way.

I read your posts and end up feeling more confused and disoriented than before. For someone who believes that scholarly arguments should be transparent and accessible to "any reasonable reader", you don't seem to be succeeding very well.

On your claim that those participating in this thread are colluding to hound and attack you: I don't know Petroglyph, except insofar as our paths have occasionally crossed on LT. I know one or two of the other posters somewhat better, but we have not discussed our posts in this thread, nor do I feel any obligation to agree with them out of a sense of loyalty. It is, frankly, a bit insulting that you believe none of us are able to think for ourselves or express our own opinions.

The fact that many participants in this thread have continued to engage with you is not the result of a conspiracy to waste your time and attack you. I see it as more of a sign of good will -- rather than simply dismissing ideas that at first glance seem very far-fetched and backed up by very tenuous evidence, they are giving you a chance, repeatedly, to demonstrate in ways that we can understand that your method works and your claims have validity and are robust enough to stand up against critique (much milder critique, may I add, than you would likely encounter during peer review).

Since you've mentioned the difficulty of conducting research as an independent scholar, I want to note that I have sympathy for these challenges. It isn't easy being outside the communities of scholars researching the same thing you are, or being cut off from the resources that come with being part of academia.

However, I want to gently suggest that, independent scholar or not, there is no excuse for not being up to date on the current research and state-of-the-art in your specific field of investigation. It's not unusual for scholars to end up researching something that is somewhat outside their original area of expertise, and if they're a good scholar, they will deal with this by filling in their knowledge gaps, through training, classes, reading. (A statistics class taken decades ago as an undergraduate doesn't count as "up to date".) Not being familiar with the current methodologies -- which you don't seem to be -- doesn't convince anyone of your ability to contribute insights to the field in question.

Likewise, not having access to the literature or computer programs is also not a convincing excuse. Many university libraries are accessible to members of the general public; in many cases they even offer accounts -- for a fee -- to persons not associated with the university that make it possible to check out books, access the library databases, etc. Computer programs can be purchased, and, like library fees, are simply a part of the business costs of working in your chosen area. An auto mechanic has to pay for her tools, she has to familiarize herself with new car models and technologies, or she won't be able to continue to serve her customers. Intellectual work isn't fundamentally different in this respect.

>385 faktorovich:
I'm sorry that you aren't able to escape into books simply for pleasure anymore, but only read them to "find the underlying truth about the authors". Love of the stories authors tell and the worlds they create was the reason I studied literature in the first place, and I would be very sad if this enjoyment were ruined as a result of my research activities.

397andyl
Dic 12, 2021, 7:05 am

>390 Petroglyph: "... Multiplying? I don't split letters into trigrams. I just tell the software to look at strings of three characters. "

Yep I think that there are numerous problems with faktovorich and her understanding of computers and what they can do well. The whole aspect of studying every 3-gram in a book length work (slidng the 3 character window forwards 1 character at a time) would just not be doable with a manual (or semi-manual) process in a reasonable timeframe. Of course with the right software it is relatively easy and built in to stylo (indeed you can choose what n you want for n-grams).

398susanbooks
Dic 12, 2021, 9:48 am

>385 faktorovich: "I do not escape into books to find myself, but rather read books to find the underlying truth about their authors."

Wow, Barthes & Foucault passed you right by, huh?

399Keeline
Dic 12, 2021, 11:49 am

>367 Petroglyph: This line jumps out at me:

"Authors' birth dates range from the early 1500s through the late 2000s."

I know time flies but this can't be what is meant. We are only seeing the pre-dawn of 2022 on the horizon.

Unless, of course, some time travel prescience is involved here :)

I would say that at leas some of the Gutenberg "birthdates" are not reliable for authors in my field. I don't think it would change very much.

Editors in the past few decades have seemed to prefer fewer punctuations rather than more. Compound sentences are broken to two or more simple one. The complaints against the "Oxford comma" is another case where the use of that mark might be on the decline in terms of published works.

In the story paper and dime novel era, the text was often in narrow columns in periodical-like fashion. An entire paragraph can consist of a 3 or 4-word quotation without a signifier of who spoke it. In contrast, the early 20thC preferred style looked for longer sentences and paragraphs by comparison. We see this when we look at some stories that were edited and republished in the latter form.

If spelling was fluid in the Early Modern period, wouldn't punctuation be even more so? Plus the punctuation may be to conform to preferences of the time of publication.

For these reasons, I am not quite convinced that punctuation is a silver bullet test of authorship. There are too many people involved in the decisions of what is published.

Here is a fun take on looking at punctuation: What I Learned About My Writing By Seeing Only The Punctuation by Clive Thompson (Oct. 7, 2021).

James

400nonil
Dic 12, 2021, 12:36 pm

>395 faktorovich:
The reason I see for the miscommunication between Petroglyph and yourself seems to be that you do not trust any data or information that you cannot verify herself.
Of course, the replicability of results is vital for science and other research fields to function, so in itself that is a useful impulse. Unfortunately, it seems that you do not have the programming and statistics experience necessary to verify the research that has been cited so far yourself, which seems to result in you distrusting them entirely.
Frankly, this comes across as rather paranoid.
And considering you are working in computational linguistics, not particularly helpful, as it shuts you out of considering almost all previous work in the field.
As I see it, you have two options: make an attempt to learn the programming and statistics necessary to check the efforts in these papers, and then evaluate them, rather than dismissing them as being "too opaque", or avoid working in a field in which you refuse to engage with existing research.

401faktorovich
Dic 12, 2021, 1:08 pm

>396 spiphany: "For someone who believes that scholarly arguments should be transparent and accessible to 'any reasonable reader', you don't seem to be succeeding very well." Your point is that my arguments are more complex and require more concentration to understand them; this is true. My point is that anybody who concentrates enough to dissect my arguments into their pieces will find overwhelming evidence that they are based on facts and are logical; whereas, if anybody (as I have done) concentrates hard enough on any of these rival computational-linguists' articles, they will find the falsehoods and irrationality. I have succeeded in demonstrating "in ways that we can understand that your method works and your claims have validity and are robust enough to stand up against critique". But you are suggesting that my status as an "independent scholar" means that I am "not being up to date on the current research and state-of-the-art in your specific field of investigation." I am up to date on scholarship in this field. I cite most of the main researchers in computational-linguistics in Volumes 1-2 of my series, which Petroglyph has now requested for review, so he should have found these citations by now. I do not agree with the methodology of these computational-linguists that I cite and we have been discussing in this string. I have explained the faults I have found with their method in Volumes 1-14 and across this thread. My criticisms of the errors of these insiders in computational-linguistics is the actual fault that is leading to these waves of extremely hostile insults from your group in this thread. If I had not been up-to-date on research in this field, I would not have been able to provide the rebuttals I have used across this discussion. All of my counters in this discussion are based on previous research I have done in this field by testing these rival computational-linguistic theories and findings with my own method and finding faults in them. I have demonstrated in this discussion that Stylo creates numerous glitches, and does not provide anything but word-frequencies in its basic program that is somewhat accessible to the public; as Petroglyph has explained each researcher has to create their own program in Stylo to apply any other tests. I have also explored dozens of other computational-linguistic software packages that researchers in this field advertise in their articles, and I have found all of them to be inaccessible not only because most are behind pay-walls, but because some require special permission from the creator for a researcher to use them. I have explored all alternative methods, and have derived my 27-method as the best possible method of authorial-attribution using computer software. I enjoy research (including reading) more than any other activity on this planet; this is why I have been doing it pretty much for free for a decade.

402amanda4242
Modificato: Dic 12, 2021, 3:07 pm

faktorovich, please stop. For your own sake, please stop.

You have convinced nobody here of the validity of your findings because you have not convinced anyone here of the soundness of your methodology. People who have no stake in upholding the status quo of computational linguistics or Renaissance literature have looked at your work and not been convinced. Further, your replies have brought into question your knowledge of the field of computational linguistics and your ability to conduct competent research in it. And please don't blame our doubt on Petroglyph: we're all capable of reaching our own conclusions about your work.

You obviously are not getting the response you hoped for out of this interview, so I would suggest for your own peace of mind that you click the "Ignore This Topic" button at the top right of this page and move on. Perhaps you could spend the time you've been dedicating to this thread to rediscovering the joys of reading for pleasure.

403SandraArdnas
Dic 12, 2021, 3:46 pm

>401 faktorovich: Your point is that my arguments are more complex and require more concentration to understand them; this is true.

That was not spiphany's point, nor is it true. I am trained in linguistics and literature and your arguments are not complex, but muddled, unclear and confusing. That is not a result of their complexity, but of your way of thinking, which isn't particularly logical. Also, you never really respond to questions and issues people raise, but rather you have your own story which you keep verbalizing and nothing anyone ever says seems to register at all. It's a monologue and it's anyone's guess why you need an audience at all.

Finally, kindly stop claiming that people are insulting you. Personal insults are not allowed and are promptly flagged by people following the thread. If you insist someone insulted you, point to the post in question. Otherwise, stop with the nonsense. I suspect you think criticism of your work is a personal insult and attack, in which case I wonder how you gained your credentials. It is quite unbelievable that one would earn a PhD and not consider giving and receiving criticism of scholarly work as normal.

404faktorovich
Dic 12, 2021, 5:03 pm

>403 SandraArdnas: I have addressed every question and argument in this thread relevant to my research. My goal is to teach those reading this discussion about the computational-linguistics author-attribution method I have invented, and the history-changing findings it has led me to in the Renaissance and other periods. I will continue to answer any questions asked, and especially those that aim to understand my arguments better. If there is anything about my arguments that has confused you, please specify what points these are and how I can help you understand them better.

405SandraArdnas
Dic 12, 2021, 9:27 pm

>404 faktorovich: No, you haven't. You haven't addressed a single objection and question about your methodology. You just keep parroting your own story without addressing anyone's arguments, questions, objections or whatever. You've been talking to yourself, not the people in this thread

406reading_fox
Dic 13, 2021, 6:17 am

>387 Petroglyph: - Is there a most specific gram length? If you were to have re-run that clustering with 1,2,3,4,5.. gram lengths do you get better or worse results with longer grams?

>383 Petroglyph: - wave without a shore is another great story about the philosophical position regarding strong-willed characters who believe so strongly in their own version of reality that the World At Large has no choice but to yield. It's only novella length so quite accessible.

407spiphany
Modificato: Dic 13, 2021, 12:53 pm

>401 faktorovich:: The reason I suggested that you are not up to date is not because you are an independent scholar (a status you have emphasized in relation to the challenges it entails; I frankly don't care about your affiliation or lack thereof). Rather, you have demonstrated repeatedly in this thread that you are not familiar with many of the methods and technologies used by others currently working on stylometrics and computational linguistics.

It is, of course, your right to use your own methodologies, but part of being up to date is knowing what others in your field are doing, how, and why -- even if you choose to make an informed decision to reject these methods. (The key word here being "informed").

However, your rejection of these methods, at least in this thread, seems to be based, at least in part, on your lack of mastery of the programs in question. (To wit: not having heard of R, not being able to follow fairly simple instructions on how to use it, not being familiar with different types of character encoding. You mention "not having access" to a cited article as an excuse for not reading it in depth, even though a non-paywalled version of the article was readily findable online -- a skill that any scholar who has needed to track down articles while away from access to a library database will have acquired.)

It is also telling that your response to the studies or explanations others have cited in this thread very much resembles that of someone encountering these ideas for the first time. It is equally telling that you never mention other recent research whose methods you are building on or adapting, or other scholars whose work you do agree with. I mention this not because of some unquestioning veneration of the hallowed halls of academia, but because it would help situate your work in a wider context. Nobody, even the most brilliant scholar, creates their ideas completely from scratch.

Regarding the comprehensibility of your arguments: you simultaneously say that you want your ideas to be "understandable to any reasonable reader" and yet claim that I have not understood them because your "arguments are more complex and require more concentration to understand them". You do see the contradiction here? An idea can't be both accessible and too complex to summarize effectively.

As I noted, my job is reading and making sense of complex and difficult to understand texts by unskilled authors on topics I don't necessarily have much background in. I thus have lots and lots of practice with the required "concentration" and should therefore be ideally prepared to understand the ideas of a successful scholar in a field (literature) that I actually do know a little bit about.

Yet I quickly find myself lost when reading your explanations here.

Here are a few of the difficulties I have with your argumentation (not the math, which I'm not qualified to speak on):

1) Logical flaws, the biggest of which could be summarized as "correlation does not equal causation": i.e., a test that identifies similarities between two works does not prove anything about authorship; it only identifies ways in which the works are similar. It's true that in some cases, one reason might be that they have a common author, but you do not discuss or even admit that there might be other possible explanations for similarities. This brings me to my next point:

2) Black-and-white, absolute thinking: your approach does not allow for any grey areas, any results that are not definitive. This is, particularly when working with statistics that deal in probabilities and percentages rather than absolutes, a rather problematic position. Your claims would be much stronger and more convincing if you included a discussion of various explanations for the similarities you find and an assessment of which ones are most likely, and why -- particularly since the claims you are making are so radical.

3) The principle of parsimony, or "Occam's razor": in essence, this is the idea that a theory which requires a whole host of secondary postulates or algorithms to make it work is less likely to be true than one which does not. Your model violates this principle because it requires introducing previously unknown factors whose existence is unproven -- in this case, ghostwriters -- in order for your theory to work. This isn't to say that your theory is impossible, but it requires adjusting a whole series of other things, many of which may in fact be accepted at present precisely because there is substantial evidence that supports them.

4) Lack of context: you mention repeatedly how your calculations supposedly prove the existence of ghostwriters, but there is a striking lack of discussion of other forms of evidence that support this. Part of making an argument is not just repeating the conclusions over and over, but explaining how you got there in relation to previous knowledge.

5) Unwillingness to admit error or adjust your claims based on new evidence: this really doesn't engender confidence in your work. Nobody knows everything, nobody is infallible, and scholarly knowledge is always to some degree a work in progress, constantly being adapted and expanded and corrected. Any scholar worth her salt will acknowledge flaws and things she overlooked and integrate these insights into future work. A scholar who digs her heels in and responds to critique by claiming that she is being discriminated against (on account of her gender, her ethnicity, her outsider status, etc. etc.) is unlikely to earn the respect of her colleagues.

408susanbooks
Dic 13, 2021, 1:58 pm

Nicely said, >407 spiphany:

Throughout this thread the line, "When you hear hoofbeats, don't think of zebras," has been going through my head. (A colloquial version of Occam's razor: think of the obvious (horses), not the unusual (zebras). Unless you're on African grasslands, then all bets are off.

409faktorovich
Dic 13, 2021, 2:00 pm

>407 spiphany: I cite the following sources just in the "A New Computational-Linguistics Authorial-Attribution Method Described and Applied to the British Renaissance" chapter of Volumes 1-2 of my Re-Attribution series. This book is 698 pages long, and most chapters have more citations than this. I cite various other computational-linguistic studies in the individual chapters that address specific texts being reattributed out of the 284 I tested etc. Additional citations and evidence is presented in Volumes 3-14, which all together make 14 books I have written on this subjected related to the British Renaissance alone.

Scott McCrea, “Two Shakespeares: A Skeptical Analysis of Shakespeare and His Works Reveals the Real Author”, Skeptic, Vol. 9, No 4, 70-5.
Olivia Serres, The Life of the Author of the Letters of Junius, the Rev. James Wilmot (London: E. Williams, 1813).
Nathan Baca, “Wilmot Did Not”, Shakespeare Matters, 2 (Summer 2003).
William Spalding, A Letter on Shakespeare’s Authorship of The Two Noble Kinsmen, a Drama Commonly Acribed to John Fletcher (Edinburgh: Adam and Charles Black, 1833). Samuel Hickson, Westminster and Foreign Quarterly Review 48 (1847), 59-88.
Joseph C. Hart, The Romance of Yachting: Voyage the First (Google Books; New York: Harper & Brothers, Publishers, 1848).
Samuel Hickson, “A Confirmation of Mr. Spedding’s Paper on the Authorship of Henry VIII” Notes and Queries 2 (43, 1850), 198. James Spedding, “Who Wrote Shakspere’s Henry VIII.” Gentleman’s Magazine, New Series, (HathiTrust; 34, 1850), 115-23, 381-2.
Gabriel Egan, “Chapter 2: A History of Shakespearean Authorship Attribution”, Authorship Companion, G. Taylor & G. Egan, eds., The New Oxford Shakespeare (Oxford Scholarly Editions Online; Oxford: Oxford University Press, 2017), 29.
John Burrows, “All the Way Through: Testing for Authorship in Different Frequency Strata.” Literary and Linguistic Computing (22, 2007), 27-47.
Egan, “Chapter 2”, 44.
Ahmed Shamshul Arefin, Renato Vimiero, Carlos Riveros, Hugh Craig, and Pablo Moscato, “An Information Theoretic Clustering Approach for Unveiling Authorship Affinities in Shakespearean Era Plays and Poems”, Public Library of Science (PLoS) One 9 (10, 2014), e111445.
Delia Bacon, “William Shakespeare and His Plays; an Inquiry Concerning Them”, Putnam’s Monthly (January 1856, No. 22).
William Henry Smith, Bacon and Shakespeare: An Inquiry: Touching Players, Playhouses, and Playwriters in the Days of Elizabeth (Hathi Trust; London: J. R. Smith, 1857), 18.
Elizabeth Winkler, “Was Shakespeare a Woman?”, The Atlantic (Washington, DC: The Atlantic Monthly Group, June 2019, 86-95; EBSCO Academic Complete).
Edmund G. C. King, “Cardenio and the Eighteenth-Century Shakespeare Canon”, The Quest for Cardenio, eds. David Carnegie and Gary Taylor (Oxford: Oxford University Press, 2012), 87.
Diana Price, “Shakespeare’s Authorship and Questions of Evidence”, Skeptic, Volume 11, Number 3, 2005 (Skeptic Society, 10-5).
Eliot Slater, The Problem of “The Reign of King Edward III”: A Statistical Approach (Cambridge: Cambridge University Press, 1988), 97.
Brian Vickers, Shakespeare, “A Lover’s Complaint” and John Davies of Hereford (Cambridge: Cambridge University Press, 2007), 193, 215, 265.
Vickers, Shakespeare, 136, 138, 143, 151, 210.
MacDonald P. Jackson, Defining Shakespeare: “Pericles” as Test Case (Oxford: Oxford University Press, 2003), 83, 88.
Sir Leslie Stephen and Sir Sidney Lee, eds., Dictionary of National Biography, Volume 3 (London: Smith, Elder & Co., 1908), 575.
Frank Kidson, British Music Publishers, Printers and Engravers: London, Provincial (London: W.E. Hill & Sons, 1900), 79.
Thomas Heywood, “To the Reader”, The English Traveler (Internet Archive; London: Robert Raworth, 1633), 5.
Sidney Lee, Ed., Dictionary of National Biography, Vol. XLV, Pereira-Pockrich (Google Books; New York: MacMillan and Co., 1896), 416.

The definition of "informed" is "showing knowledge", so these citations are clear evidence that I have shown my knowledge of the studies in this field. I have also shown my "knowledge" of the various studies that Petroglyph has cited by explaining not only what they are saying, but also the errors in their approach across this discussion. Without reading my Re-Attribution series, you are the ones who are not "informed" about my method, or my knowledge. I have heard of R before; the questions I have asked about Stylo/R in this discussion were designed to reach the conclusion I expected or that they would not be able to give a set of simple steps for how to use these tools to actually perform a linguistic analysis; proving that these tools are inaccessible and only perform word-frequency-analysis in their standard functions by the admission of those who use them (Petroglyph) was the goal, which I clearly achieved. A good teacher asks students to say what they know to discover the gaps in what they do not know, and this was the point of my questioning of Petroglyph on R/Stylo.

Yes, the point of me being in this space is for you to ask me questions on the points that you find confusing, and I will explain them or re-explain them in detail until they are clear to you. Here are answers to your questions.

“Correlation doesn’t equal causation”: even if 2 works are similar to each other, how can this determine authorship? My study of the British Renaissance is unlike the smaller studies I have done of other periods. In this century, I analyzed not only the 284 texts that that I actually ran through the 27-tests, but also hundreds of other relevant texts (which I discuss across my translations), and hundreds of other minor bylines that comprise nearly all of the printed bylines in texts published during this period. After I had the linguistic matches/non-matches and had the texts grouped into similar-texts clusters, I used the first and last publication/performance dates in each of these clusters to exclude these hundreds of alternative bylines where authors were not alive from start-to-finish to have written all or most of the texts that are in their group. The lifespan during this century was small with 40s being the mid-lifespan, so most were thus logically excluded. I tested 104 bylines that include all of the major “authors” that have been mentioned in studies of this period. An equivalent would be if there was a college on an isolated island without an internet connection with 2,000 students/teachers etc. on it, and there was evidence that all of their writing was performed by only six of them, and I had firm evidence to exclude 1,896 of them because they had never logged into a computer etc. Then I tested the remaining 104 bylines and performed research into their school records, prior writing histories etc. to conclude that 98 of them must be excluded as potential ghostwriters, leaving the 6 that must be these ghostwriters I am seeking for. And as I looked into the records, histories, surviving documents, handwriting analysis, financial statements etc. of these 6 this linguistic attribution using the 27-tests method was re-affirmed. The handwriting samples in signatures between manuscripts attributed to the “Sidneys” (Mary and Philip), for example, match Josuah Sylvester’s, who was known to have worked for this family, before he became the official Court Poet. There is a record of William Percy taking out a loan for L2,400 just before his pseudonymous investment in theater/troupe building after Elizabeth granted the London theater duopoly to a man using the “Shakespeare” pseudonym. You would only see a logical fallacy in this reasoning if you are only reading my interview/conclusions, and not at least Volumes 1-2, where I explain the evidence why the six-ghostwriters I have found to have done it really did it. This would be like objecting to my investigation of who the ghostwriters were on the island by only reading my conclusion and objecting that just because their styles match other papers, it doesn’t mean they did it.

Grey areas are discussed whenever they occur in the series. These discussions include this whole section in Volumes 1-2 with its four chapters:

Part IX: Studies in Exclusion of Potential Authorial Bylines
George Chapman: De-Attributing a Ghostwriter-Contracting Debtor
Nicholas Breton: Distinguishing Pseudonyms in Coded N. B. and B. N. Initials
Anthony Monday: Divergences Between the Thief and the Ghost Behind the “A Monday, Citizen and Draper of London” Byline
The Fletchers and the Beaumonts: Two Families of Ghostwriter-Contractors

Each of these chapters explains why Chapman, Breton, Monday, Fletcher and Beaumont were uniquely difficult to exclude until I dug far enough into their biographies, documentary records to have absolutely concluded they could not have been writers. You have to read these chapters to understand why these conclusions are beyond doubt. I can send a review copy to you, if you want to do so. I include reasons for exclusion for all 98 bylines that I tested and concluded they were not the ghostwriters across the rest of the book.

Occam's razor: simplest explanations are usually best. I include several chapters where I present a re-interpretation of historical facts that explain why my version of the writerly history of this century is accurate, while rival theories are far more fictitious. Here are the four chapters that uniquely explore historical parameters related to the theater; other chapters cover other genres.

Part II: The Birth of the British Theaters
“Philip Henslowe’s” Financial Schemes as a Theater Landlord
Manipulation of Theatrical Audience-Size: Nonexistent Plays and Murderous Lenders
Crime and Corruption Behind the Ghostwriting Workshop
Masters and Minstrels in the Renaissance Theater: Patronage for Propaganda

You are claiming that the presence of ghostwriters is unknown but this is not true. More than half of the texts published in these decades were initially anonymous, thus assuming they were written by ghostwriters is the most rational assumption. Leaping to assigning hundreds of different bylines to anonymous texts because they intuitively seem similar to hundreds of other texts is far more far-fetched (this is what scholars have been doing for 400 years), than systematically finding all texts that are similar between them and assigning them quantitatively to the most likely ghostwriters behind them.

Lack of context: The context is provided across Volumes 1-2. It would be absurd for me to make the entire book free by posting it all here. You guys have also misunderstood pieces of evidence I have been presenting in this discussion as if they are the only evidence I have in the book vs. just small bits of evidence. So just ask me for a review copy of the series, read the context, and then decide if you need additional context, and if so post what you need here, and I will provide answers. I do not want to repeat any piece of evidence, but you guys have been repeating your questions, so I have been assuming I had to rephrase the answers.

Scholarship is constantly being adjusted. I absolutely agree. This is why I adjusted my findings hundreds of times across the past 2 years as I worked on Volumes 1-2 until I could not find any other objections of my own or among peer-reviewers that I had not already answered. Once I was absolutely certain of my findings; then I published the series, and that’s what it is. It would be great if other scholars added to it, but first they would actually have to read my findings. I do not care at all about “respect”; I only care about my research and in providing overwhelming proof to support it, as I have done.

410susanbooks
Dic 13, 2021, 2:01 pm

Anyone can write a bibliography. There are even programs that do it for you. A bibliography proves nothing.

411spiphany
Dic 13, 2021, 3:29 pm

Please stop telling me to read hundreds of pages of your self-published books when you haven't managed to convince me that your arguments are worth reading in the first place.

If we've "misunderstood pieces of evidence" because you have not provided enough information to understand how to interpret that evidence, that's on you, not on us. This post is the first time you've described in any meaningful way anything about how your work connects up with either literary history or the historical record.

One of the most fundamental skills of being a scholar is learning how to summarize your work in such a way that the essence of your arguments is captured -- not just your conclusions, but also how you arrived at those conclusions as well as things like background information and the debates in the literature. Obviously the type of background information and the level of detail you provide will depend on who your readership is: in this case, interested laypersons, who just possibly might need a bit more explanation about some of the more basic aspects than scholars of English renaissance literature. It isn't necessary to provide all your evidence in exhaustive detail -- and nobody here is asking you to! -- but we do need enough to get a sense of how your claims fit into existing knowledge. Surely I don't have to explain this to a writing teacher and a publisher?

(By the way, I'm still waiting to hear about some of the alternative explanations you considered that don't involve ghostwriting, and why you ultimately rejected them.)

I'm not sure what purpose (pedagogical or otherwise) it would serve to "test" other participants in this thread to see whether they could explain to you how to download and install a computer program, or why you would pretend to not know that Stylo is free or what character encoding is. I'm not sure which I find worse -- the idea that you're lying right now and pretending that the unmistakable confusion expressed earlier in this thread was not genuine, or the idea that you've been disingenuous all along and toying with people in this thread who have been asking questions in good faith in an attempt to understand your research. Maybe I'm naive, but surely if you are really so familiar with R and other stylometric tools currently in use, it would have been more productive to engage with your discussants in this thread as equals, or at least intelligent human beings, and explain from the start what work you had done using R and Stylo and why you concluded they weren't suitable.

412lilithcat
Dic 13, 2021, 3:42 pm

>409 faktorovich:

You are claiming that the presence of ghostwriters is unknown but this is not true. More than half of the texts published in these decades were initially anonymous, thus assuming they were written by ghostwriters is the most rational assumption

Wait, what? A ghostwriter is someone who pens a work published under another person's name.

Also, an assumption (rational or not) is not evidence. Please provide actual evidence of the existence of ghostwriters during this period.

413amanda4242
Dic 13, 2021, 4:45 pm

If anyone is interested, waltzmn wrote detailed reviews of two of the books in the Re-Attribution series. They're identical until the last paragraphs, then they cover the specific volumes.

https://www.librarything.com/work/27242598/reviews/209142318
https://www.librarything.com/work/27242596/reviews/209143017

414faktorovich
Dic 13, 2021, 4:47 pm

>412 lilithcat: I use the term “ghostwriter” in a unique variant from the current dictionary-definition. My definition is ground in lines such as these in “The Grounds of Divinity, plainly discovering the Mysteries of Christian Religion” (1633): “The holy scriptures are all those Books of the Old and New Testament, by the direction and inspiration of the Holy Ghost, written, or approved by the Prophets and Apostles.” Or in the commentary on “Psalm XCIII” in “Holy Bible” (1635): “For in human books the writer and author is all one; but in divine, the Holy Ghost is the proper author, and a man is the writer.” The non-theological application of a similar meaning is found in a byline, “Written by Thomas Nash his Ghost, with Pap with a Hatchet” (1642 edition). All of the tested (out of 284) texts with the “Nash”/“Nashe” byline matched Richard Verstegan’s linguistic signature, as did most of the theological commentary like the two 1630s biblical criticisms cited above. Thus, it is very likely that Verstegan ghostwrote all three of these texts where he equates the “Ghost” with the author while insisting there is a distinction between these being two entities vs. an author who is not speaking through a ghost being merely one entity. All three of these usages become funnier and more revealing the more Renaissance literature one reads, as one learns that many “famous” “authorial” bylines (including “Philip Sidney” and “Christopher Marlowe”) appeared for the first time after these “authors’” death; in these instances, it seems that their “ghosts” became authors when they were already dead and thus could not have written the books attributed to them themselves. I explore several of these ghostly apparitions across the series. This is just to answer why I am referring to these six writers as ghostwriters, even if they were not hired by all of the byline-holders to write these various texts, having written half of this period’s texts anonymously, or various other types of circumstances I explain in the series.

415faktorovich
Modificato: Dic 13, 2021, 5:05 pm

>411 spiphany: I can only answer questions that are asked. Having written 14 books on this subject, there are 2,500 pages of information I could post here, but there is no space for this much evidence here, especially since you are refusing to read even only hundreds, and thus certainly not thousands of pages of my research. Thus, if you ask new questions, I will post new information. You are asking for me to summarize my argument now. I include a summary of the argument on this series main page: https://anaphoraliterary.com/attribution/.

This series solves most of the previously critically discussed mysteries concerning the authorship of British Renaissance texts (including the “William Shakespeare” and 103 other bylines) by applying to 284 of them a newly invented for this study computational-linguistics method that uses a combination of 27 different tests to derive that six ghostwriters were their authors: Richard Verstegan, Josuah Sylvester, Gabriel Harvey, Benjamin Jonson, William Byrd and William Percy. This computational method as well as structural, biographical and various other attribution approaches that led to the attribution conclusions are discussed in Re-Attribution of the British Renaissance Corpus. A larger portion of this series is Modernization of the Inaccessible British Renaissance, which tests the quantitative attribution-conclusions by closely analyzing and explaining the contents of re-attributed texts that are uniquely significant for the revised history of this period, and yet have never been translated into Modern English before. Some of these texts were initially anonymous, others were self-attributed by the ghostwriters, and yet others were credited in bylines to pseudonyms or ghostwriting-contractors. The annotations to each of their translations provide thousands of new confirming clues of shared authorship within a given authorial-signature. Even without this history-changing attribution evidence, these are neglected texts that are here edited for the first time to allow their beauty and intelligence to shine so that readers can see how they rival the standard “Shakespeare” canon. This series is cataloged in the World Shakespeare Bibliography and in the Play Index (EBSCO). The Journal of Information Ethics published two articles on Faktorovich’s re-attribution method: “Publishers and Hack Writers: Signs of Collaborative Writing in the ‘Defoe’ Canon” (Fall 2020) and “Falsifications and Fabrications in the Standard Computational-Linguistics Authorial-Attribution Methods: A Comparison of the Methodology in ‘Unmasking’ with the 28-Tests” (forthcoming in around Spring 2022).

Re-Attribution of the British Renaissance Corpus

The first accurate quantitative re-attribution of all central texts of the British Renaissance.

Describes and applies the first unbiased and accurate method of computational-linguistics authorial-attribution.
Covers 284 texts with 7,832,156 words, 104 authorial bylines, a range of genres, and a timespan between 1560 and 1662.
Includes helpful diagrams that visually show the quantitative-matches and the identical most-frequent phrases between the texts in each linguistic-signature-group.
Detailed chronologies for each of the six ghostwriters and the bylines they wrote under, including their dates of birth, death, publications, and other biographical markers that explain why each of them was the only logical attribution.
A full bibliography of the 284 tested texts.
All of the raw and processed data, not only in summary-tables inside of the book, but also in-full on a publicly-accessible website: https://github.com/faktorovich/Attribution.
One table includes all of the data from the first-edition title-pages (byline, printer, bookseller, date, proverbs), and the first-performance (date, troupe).
A table on structural elements across all “Shakespeare”-bylined texts summarizes their plot-movements, character-types, settings, slang-usage, primary sources, and poetic design (percentage of rhyme and hendiadys).
To explain why these are the first truly accurate re-attributions, numerous reasons for discrediting previous attribution claims are provided throughout.
Re-Attribution of the British Renaissance Corpus describes a newly invented for this study computational-linguistics authorial-attribution method and applies it and several other approaches to the central texts of the British Renaissance. All of the attribution steps are described precisely to give readers replicable instructions on how they can apply them to any text from any period that they are interested in determining an attribution for. This method can be applied to solving criminal linguistic mysteries such as who wrote the Unabomber Manifesto, or theological mysteries such as if any of the Dead Sea Scrolls might have been forged by a modern author. This method is uniquely accurate because it uses 27 different quantitative tests that measure a text’s dimensions and its similarity or divergence to other texts automatically, without the statisticians being able to skew the outcome by altering the experiment’s analytical design. Re-Attribution guides researchers not only on how to perform the basic calculations, but also how to perform the biographical and documentary research to derive who among the potential bylines in a single signature-group is the ghostwriter, while the others are merely ghostwriter-contractors or pseudonyms. Reliable accuracy is achieved by also performing other types of attribution tests to check if these alternative approaches validate or contradict the 27-tests’ findings. Non-quantitative tests discussed include deciphering the hidden implications of contemporary pufferies, as well as comparing structural elements such as characters, plot, and element borrowings. Part II presents a revised version of the history of the birth of the theater in Britain by reviewing forensic accounting evidence in Philip Henslowe’s Diary, and the documented history of homicidal lending practices and government corruption connected with troupes and theaters. Parts III-VIII explain precisely how this series derived that the British Renaissance was ghostwritten by only six linguistic-signatures: Richard Verstegan, Josuah Sylvester, Gabriel Harvey, Benjamin Jonson, William Byrd and William Percy. The parts on each of these ghostwriters, not only explain how their biographies fit with the timelines of the texts being attributed to them, but also provide various types of evidence that explains their motives for ghostwriting. And Part IX returns for an intricate analysis of a few pseudonyms or ghostwriting-contractors who were uniquely difficult to exclude as potential ghostwriters; in parallel, these chapters question the reasons these individuals would have needed to purchase ghostwriting services.

“The complete series on British Renaissance Re-Attribution and Modernization by Anna Faktorovich is a remarkable accomplishment. Based on her own unbiased method of computational-linguistic authorial-attribution, she has critically examined an entire collection of texts, many previously inaccessible and untranslated to modern English. From a variety of distinct factors that have been ignored or unnoticed in the past, she identifies a group of ghost writers behind many miss-attributed Renaissance works. Of particular interest are works traditionally attributed to William Shakespeare. Dr. Faktorovich is a prolific writer, very well informed in English literature, philology, and literary criticism, and she is clearly thorough and detail-oriented. Her re-attribution and modernization series demonstrates solid scholarship, fresh perspective, and willingness to challenge conventional thought and methodology.” —Midwest Book Review, Lesly F. Massey (December 2021)

List of Figures

Part I: Methodologies of Re-Attribution

Introduction: The Ghostwriting Workshop Behind the British Renaissance

A New Computational-Linguistics Authorial-Attribution Method Described and Applied to the British Renaissance

An Impressionist Overview of the British Renaissance Ghostwriting Workshop

Attribution Clues in Contemporary Allusions to “William Shakespeare”

The Patterns Distinguishing the Six Authorial-Signatures of the British Renaissance Ghostwriting Workshop: The Case Against “Shakespeare”

Structural Divergences Between the Established “William Shakespeare” Canon and the New Re-Attributions

Part II: The Birth of the British Theaters

“Philip Henslowe’s” Financial Schemes as a Theater Landlord

Manipulation of Theatrical Audience-Size: Nonexistent Plays and Murderous Lenders

Crime and Corruption Behind the Ghostwriting Workshop

Masters and Minstrels in the Renaissance Theater: Patronage for Propaganda

Part III: William Byrd

Rhythm, Music and Monopoly

Amidst William Byrd’s Fraudulent Pseudonyms and Piracy Litigations: “William Shakespeare”, “Thomas Morley”, and “Thomas Lodge”

Part IV: Richard Verstegan

The Secret-Secretary to Elizabeth I and James I

The Secret-Secretary to Aristocrats

Between the “Marprelate War” and the King James Bible

Part V: Gabriel Harvey

From Ghostwriting “Elizabeth I’s” Letters and “Spenser’s” Faery Queen to Debtor’s Prison

After Academia: “William Shakespeare”, “R.” and Other Bylines of Unlikely “Authors”

Part VI: Josuah Sylvester

The Case for Re-Attributions to a Court Poet

Circuitous Evidence of Ghostwriting

Aristocratic and Royal Sponsors: “Robert” and “Mary Sidney” and “Henry Constable”

By Any Other Name: “William Shakespeare”, “George Peele” and “Joseph Hall”

The Ostracizing of the Jew in Renaissance England: The Disguise of the “Anonymous Writer”

Part VII: William Percy

The Tragedian “Shakespeare”

Plot Construction and Pericles, “Shakespeare’s” Strange Comedy

Attribution Case-Studies

“William Shakespeare” Apocrypha

Part VIII: Benjamin Jonson

The Comedian “Shakespeare”

Attributing Arden of Faversham

The Ghostwriting Workshop’s Subversive Autobiography: The Epigrams to “Fletcher-Beaumont’s” Comedies and Tragedies

Part IX: Studies in Exclusion of Potential Authorial Bylines

George Chapman: De-Attributing a Ghostwriter-Contracting Debtor

Nicholas Breton: Distinguishing Pseudonyms in Coded N. B. and B. N. Initials

Anthony Munday: Divergences Between the Thief and the Ghost Behind the “An. Mundy, Citizen and Draper of London” Byline

The Fletchers and the Beaumonts: Two Families of Ghostwriter-Contractors

Authorial-Group Chronologies

Bibliography: Texts Tested for Attribution

Index

Modernization of the Inaccessible British Renaissance

The first accessible translations of some of the best British Renaissance texts that have been tragically neglected.

Modernization of the Inaccessible British Renaissance opens texts to the public that have remained hidden in the archives because they have not been given the scholarly care lavished on the narrow standard canon of taught Authors. The absence of translations of these texts might have had a detrimental impact on world history because they explore the Islamic faith, homosexuality, promiscuity, and a myriad of other subjects with respectful warmth and acceptance that could have stopped wars of prejudice and unjust prosecutions across the past four centuries. These translations are executed with a unique method designed for this series that inserts a modern term into the body of the text to maximize reading-ease, and includes the original-spelling word or phrase, the source of the definition, and comments on alternative meanings in an annotation. Extensive annotations explain the meaning of proverbs, mythological and theological allusions, invented-words’ origins, and various other elements. As part of the British Renaissance Re-Attribution and Modernization Series, each text is accompanied with explanations regarding its computational-attribution and with additional evidence that strengthen these quantitative findings. One type of attributing evidence mentioned across the annotations is when borrowings of segments of text or plot and characters repeat across two or more texts in a single signature-group, such as those ghostwritten by William Percy. The translated texts are illustrated with enhanced versions of original artwork from their first editions. Most of these plays originally did not include Act or Scene divisions; these are added to orient readers in the text and to assist directors. A set of introductory elements that appeared in only some of these plays were added into all of them, including: “The Names of Persons” with character-summaries, “The Properties” that describe the set furnishings and design, and throughout the plays missing staging directions were added that help to clarify characters’ interactions. Primary source materials accompany texts where they are needed to explain the originating historical or fictional plotline or the pre-translation language they are imitating. The introductory sections present documentary evidence and biographical materials about the ghostwriters. Each text is introduced with a history of its previous publications and performances. An overview of textual, attribution or other types of scholarly research about each text helps to orient researchers who want to explore further. Extensive plot synopses are provided, with explanations of the themes, tropes, and other noteworthy patterns. And sections on staging propose potential approaches these plays can be practically staged by modern troupes or cinematically presented on film. Staging diagrams of the furnishings, props and architecture are designed for each play to help theater directors pick a play most suitable for the resources of their theatrical space. And to assist busy teachers and professors with enlivening and kickstarting a class, each text is accompanied with sections of key terms, references for further reading, questions for further discussion (themes, story structure, close reading), and creative and scholarly writing, and dramatic performance exercises.

List of Illustrations

Prefacing Notes on Sources, Abbreviations and Translation Style

PART I: WILLIAM PERCY

William Percy (1567?-1648) is the dominant tragedian behind the “William Shakespeare” pseudonym according to the computational-linguistic study in The Re-Attribution of the British Renaissance Corpus. Percy was a younger son of the assassinated 8th Earl of Northumberland and the brother of the imprisoned in the Tower 9th Earl.

Introduction to Part I

The Three Letters of William Percy

Sonnets to the Fairest Coelia (1594)...

It goes on with summaries of the 12 volumes of translations with detailed introductions/annotations that provide additional evidence for the re-attributions.

---

I considered that all bylines represent real and unique people until the evidence convinced me otherwise. I also considered various combinations of more than six ghostwriters, or the presence of ghostwriters and authentic bylines, but the evidence led to the conclusions I present in the final books.

I asked for Petroglyph to give the steps of his method because I had tested these steps previously after reading research in this field, and receiving criticism from top computational-linguists on earlier versions of my research. As I mentioned in this discussion, I received a similar data-set to the one Petroglyph shared for his "Lunch" experiment after I had made this test with a top computational-linguistics specialist. I had found the same errors in these steps, and the inaccessibility of these software tools that I found when Petroglyph posted his answers. I had to ask him to give the steps because otherwise he could have said there were steps he was not disclosing (as he later did) that should have led to better results. If the steps he did disclose and the process as it is described in the articles he cited (as I explained) if faulty and has numerous errors in it; then, I have succeeded in explaining why these "standard" computational-linguistic attribution method(s) are faulty and my method is the only one that gives accurate and fully-supported with relevant data results. I already cite my familiarity with the rival computational-linguistics software packages across the Re-Attribution series; I did not hide this fact; you are simply refusing to read my series and to find out what I show to know in those pages.

416lilithcat
Dic 13, 2021, 5:11 pm

>414 faktorovich:

We have now clearly gone Through the Looking-Glass.

"When I use a word," Humpty Dumpty said, in rather a scornful tone, "it means just what I choose it to mean—neither more nor less."

417paradoxosalpha
Modificato: Dic 14, 2021, 12:17 am

When you've got a tacitly idiosyncratic definition for a key term of your central claim, it's trouble.

418faktorovich
Dic 13, 2021, 5:52 pm

>416 lilithcat: The use of invented-words is very appropriate for the British Renaissance as these ghostwriters "invented" so many words they multiplied the English language from Middle English into Early Modern English, or close to what we speak today, Modern English. If you don't find my use of the term "ghostwriter" funny, you really should read my Re-Attribution series, and the joke will become funnier by-the-page.

419Petroglyph
Dic 13, 2021, 8:37 pm

>396 spiphany:

Thanks, spiphany. That's a much clearer way of phrasing things than mine.

>399 Keeline:
Yeah, that was a typo. I've altered it to "around 2000". Thanks for catching that!

420Petroglyph
Modificato: Dic 13, 2021, 8:54 pm

>406 reading_fox:

For higher n-grams, results tend to be higher, I believe. I recall (though I'm not sure of the source) that 4-grams tend to perform a little better than trigrams. The release notes for the latest-but-one version of JGAAP, a stylometry tool developed by Patrick Juola, say this: "This release changes the default length used for Character N-grams from 2 to 10. This change is based on research results which suggest that slightly larger Character N-grams tend to yield better results." (no sources are cited, but they might

(Juola, by the way, is one of the computer linguists who were instrumental in detecting JK Rowling's stylistic quirks in Robert Galbraith's work. If you're on a Mac, and you want a GUI interface for playing at stylometry, give JGAAP a look. Instruction video here.)

"If you were to have re-run that clustering with 1,2,3,4,5.. gram lengths do you get better or worse results with longer grams?

I tested the Oz corpus in >309 Petroglyph: using n-grams between 1 and 10 characters (inclusive)

You get similar results for nearly all of them, in that the two authors remain separate almost consistently. The distances between Baum's cluster and Thompson's shrink a little as the n-grams lengthen; and the distances between some of each author's books grow larger; several books' closest pairings change as well. Some individual books are even re-assigned during one or two tests. I can provide graphs if you like!

Three individual works switch author during one or two tests:

For single-character n-grams (least reliable), Thompson's Cowardly Lion is assigned to Baum.
For character n-grams of lengths 7 and 8, Royal (the "mystery" work) is assigned to Baum instead of Thompson.
For n-grams of length 6, one of Baum's books (The Marvelous Land of Oz) is assigned to Thompson.

Still: Royal is assigned to Thompson in 9 out of 11 total tests (word frequency & 10 characer n-grams). That's pretty solid evidence that she wrote almost all of it.

421Petroglyph
Dic 13, 2021, 9:12 pm

This thread reminded me of a passage from Italo Calvino's If on a winter's night a traveller, translated from Italian by William Weaver. Originally published in 1979, so the technology is out of date.)

I asked Lotaria if she has already read some books of mine that I lent her. She said no, because here she doesn’t have a computer at her disposal.

She explained to me that a suitably programmed computer can read a novel in a few minutes and record the list of all the words contained in the text, in order of frequency. “That way I can have an already completed reading at hand,” Lotaria says, “with an incalculable saving of time. What is the reading of a text, in fact, except the recording of certain thematic recurrences, certain insistences of forms and meanings? An electronic reading supplies me with a list of the frequencies, which I have only to glance at to form an idea of the problems the book suggests to my critical study. Naturally, at the highest frequencies the list records countless articles, pronouns, particles, but I don’t pay them any attention. I head straight for the words richest in meaning; they can give me a fairly precise notion of the book.”

Lotaria brought me some novels electronically transcribed, in the form of words listed in the order of their frequency. “In a novel of fifty to a hundred thousand words,” she said to me, “I advise you to observe immediately the words that are repeated about twenty times. Look here. Words that appear nineteen times:

blood, cartridge belt, commander, do, have, immediately, it, life, seen, sentry, shots, spider, teeth, together, your...

“Words that appear eighteen times:

boys, cap, come, dead, eat, enough, evening, French, go, handsome, new, passes, period, potatoes, those, until...

“Don’t you already have a clear idea what it’s about?” Lotaria says. “There’s no question: it’s a war novel, all action, brisk writing, with a certain underlying violence. The narration is entirely on the surface, I would say; but to make sure, it’s always a good idea to take a look at the list of words used only once, though no less important for that. Take this sequence, for example:

underarm, underbrush, undercover, underdog, underfed, underfoot, undergo, undergraduate, underground, undergrowth, underhand, underprivileged, undershirt, underwear, underweight...

“No, the book isn’t completely superficial, as it seemed. There must be something hidden; I can direct my research along these lines.”

Lotaria shows me another series of lists. “This is an entirely different novel. It’s immediately obvious. Look at the words that recur about fifty times:

had, his, husband, little, Riccardo (51) answered, been, before, has, station, what (48) all, barely, bedroom, Mario, some, times (47) morning, seemed, went, whom (46) should (45) hand, listen, until, were (43) Cecilia, Delia, evening, girl, hands, six, who, years (42) almost, alone, could, man, returned, window (41) me, wanted (40) life (39)

“What do you think of that? An intimatist narration, subtle feelings, understated, a humble setting, everyday life in the provinces ... As a confirmation, we’ll take a sample of words used a single time:

chilled, deceived, downward, engineer, enlargement, fattening, ingenious, ingenuous, injustice, jealous, kneeling, swallow, swallowed, swallowing...

“So we already have an idea of the atmosphere, the moods, the social background.... We can go on to a third book:

according, account, body, especially, God, hair, money, times, went (29) evening, flour, food, rain, reason, somebody, stay, Vincenzo, wine (38) death, eggs, green, hers, legs, sweet, therefore (36) black, bosom, children, day, even, ha, head, machine, make, remained, stays, stuffs, white, would (35)

“Here I would say we’re dealing with a full-blooded story, violent, everything concrete, a bit brusque, with a direct sensuality, no refinement, popular eroticism. But here again, let’s go on to the list of words with a frequency of one. Look, for example:

ashamed, shame, shamed, shameful, shameless, shames, shaming, vegetables, verify, vermouth, virgins...

“You see? A guilt complex, pure and simple! A valuable indication: the critical inquiry can start with that, establish some working hypotheses.... What did I tell you? Isn’t this a quick, effective system?”

The idea that Lotaria reads my books in this way creates some problems for me. Now, every time I write a word, I see it spun around by the electronic brain, ranked according to its frequency, next to other words whose identity I cannot know, and so I wonder how many times I have used it, I feel the whole responsibility of writing weigh on those isolated syllables, I try to imagine what conclusions can be drawn from the fact that I have used this word once or fifty times. Maybe it would be better for me to erase it.... But whatever other word I try to use seems unable to withstand the test.... Perhaps instead of a book I could write lists of words, in alphabetical order, an avalanche of isolated words which expresses that truth I still do not know, and from which the computer, reversing its program, could construct the book, my book.

422Petroglyph
Modificato: Dic 13, 2021, 9:48 pm

>395 faktorovich:
I did not realize you thought being neurodivergent such an extreme "form of insult". For what it's worth, I'm sorry I let my frustration get the better of me. I should have phrased that more tactfully.

I want to be clear, though: that was not an idle accusation that I thought up to hurt you. It is an assessment based on a number of your comments and your behaviours in this thread, several of which I recognize from previous experiences with people who have OCD-like symptoms (including myself). OCD I can work with. Trolls, not so much.

I've thought a bit about the sources of why there's so much friction between users here and your own comments. Like a good Youtuber, I've listed them.

The list in (1) are things that make it hard to interact with you because you seem to be operating under a series of assumptions and habits that the rest of the commenters here just do not share. A lot of the friction in this discussion comes from this fundamental disconnect. It's good to be aware of that.

The list in (2) are things that make it very had to interact with you because the only arbiter of your results that you are willing to consider is yourself. Your comfort zone and your habits built up and sustained over several years determine what is acceptable and what is not, what is a good method, or an acceptable result. Consequently, you leave open only two options: agree with you in every way, or be smeared and excoriated. You've left no other option open to your interlocutors. It's an unreasonable expectation.

The list in (3) are things that make it very hard to interact with you because we cannot trust your own assessment of your skills. Those with actual experience in linguistics, statistics, data science, and writing up your results have identified model-breaking problems in your approach to these fields of enquirey, but you deny or ignore all of your model's shortcomings.

There's other things I could list (logical and argumentative fallacies such as false equivalences, gish galloping, moving the goalposts, pretending to be the victim, unfounded accusations of censorship, page-count-equals-quality, lies, demonizing the alternative, pretending to be unbiased), but I'm done thinking about you for a while.

Black/white thinking. You are unreasonably categorical in the way you contrast your method with alternative ways of doing things.
- You have absolute faith in the results of your method, such that if it disagrees with centuries of established fact, it is the latter that must be wrong.
- You are "convinced (...) that there are no coincidental matches; all matches indicate shared authorship." Other people might admit that their methods are not exempt from chance similarities and the laws of probability. You discount even chance and coincidence as a source of false positives.
- You only apply the 27 tests you've settled on. No variations; no subsets of, say, function words or content words that aren't already included in your handcrafted spreadsheet; no additions; no subtractions. There is only one correct way of doing authorship attribution, and it is infallible, and it must only ever be applied in one singular way.
- When asked to subject your own texts to your method (to test if your works reveal a single author signature), you say >129 faktorovich: "Sure, I can apply the tests to myself. You should know that I have done professional ghostwriting in the past, though I cannot disclose for whom. You would have to test millions of texts with around a million bylines to figure out what I have ghostwritten, as it is only a few projects in a sea of modern publishing". Here, and in several other places in the thread, you only see a single monolithic use for this methodology of yours, and it's to sniff out alleged ghost-writing and collaborations (only using corpora that are practically guaranteed to contain chance similarities). It's like you are unable (or unwilling) to see any other use for your method at all! (e.g. to detect single author signatures in a body of work you know to be by one individual.) This has happened multiple times.
- You're comfortable with character strings of the lengths you already use in your methods: single characters, individual words, three-word phrases, and sentences, and discard others (e.g.) character trigrams as "non-computable" and "not at all relevant or rational". Human-sized units of meaning you accept; machine-assisted chunks you treat as categorically beyond reason.
Here are a couple of things that you yourself, or your method cannot do (or that would be prohitively labour-intensive do do manually), and you treat all of them with open hostility and disdain:
- You fundamentally distrust any kind of sampling technique, e.g. breaking up novels of varying lengths into smaller identically-sized pieces for comparison purposes. (Even if multiple samplings eventually cover the entire novel!), and you only accept looking at entire documents in the way that you do things: by feeding them to an online service in their entirety. Other means of looking at data are smeared with accusations of data-deletion and peddling faked results to poor people who don't know better.
- trigrams (or four-grams, or whatever). With or without rolling window. "not at all (...) rational", apparently.
- You reject totally out of hand any stats and computational tools that are too advanced for you: you're not happy to live and let live -- you go out of your way to repeatedly call them "nonsensical" and "impossible" and "absolutely absurd" and "mathematical magic tricks" and you treat them as though they are inherently obfuscatory because you (or people with your skill-level of maths) can't verify them personally. Just ctrl+f or command+f "nonsensical" in this thread. At the other opposite: you take the free public accessibility of your online text analyzers as a proxy for their inherent correctness and accuracy. They have already been invented, and so no more work needs to be done. Attempts to develop more refined tools for specific purposes and under specific circumstances are excoriated, usually in categorical, black/white judgments. (e.g. in >136 faktorovich:, where you write "the R programming language is designed for folks to create statistical tools that count the number of commas etc. These programs have already been developed and are available for free online, so it is nonsensical for anybody to write new programs that repeat this completed labor." -- there's that word nonsensical again)
- When asked to follow a series of simple steps R & Rstudio (an environment that takes a long time to master and become comfortable with), you give up at the first sign of trouble. No trouble-shooting, no following advice to remedy the problem, no learning, no asking where exactly the problem lies. Instead: complete rejection of the entire project, and retrenchment with your own familiar methodology.
Here are things you say and do that may make others weary of your conclusions:
- When talking about the kinds of word-frequency measurements that are much more sophisticated than your absolute frequencies, you refuse to admit that they can provide more explanatory power than your model. You seem to think that, since both your absolute frequencies and e.g. Burrows' Delta can be called "word frequency measures", it is perfectly acceptable to treat them as equivalent; and since your model also does punctuation etc. you feel justified in touting the superiority of your model. This is the maths-equivalent of reading only the headlines.
  Similarly, that Darmon et al. study performed a) tests on many more punctuation marks than you, and b) stats of such complexity that you had to look them up. But none of the relative sophistication matters. After all, you reason, Darmon et al. only look at punctuation and sentence length, and you look at punctuation and sentence length as well as other things, so you claim method is inherently superior.
  
  To other people, the difference in complexity of calculations (and, consequentially, the relative value of their results) is intuitively different. Other people see diversity and variety under the heading of "word frequency measures" and judge them accordingly. You behave as though grouping things together under the same name makes them equivalent.
- You have learnt that the letters and word frequency may offer stylometric clues. You may have learnt that punctuation use can reveal individual author patterns. And your implementation of these features consists of taking incidental, coincidental, naturally-occurring variations in their absolute frequency entirely at face value and as diagnostic measures, with differently-named patterns and all. It's the n00best of beginner mistakes, and you've been doing it for years.
- You seem to think (>379 faktorovich:) that counting absolute frequencies of some punctuation marks independently of each other and, as another independent calculation, average sentence length, suffices to cover all possible permutations of these variables. That is not how mathematics works. That is not how any of that works. At all! This is a fundamental misunderstanding of the absolute basis of your method, and you cannot see it (or don't want to).
- You are "convinced (...) that there are no coincidental matches; all matches indicate shared authorship." This is such an absurd claim that it cannot be true. It immediately makes your claims suspicious.
- You affect confidence and experience where none is apparent.
  - In >136 faktorovich:, you say "the R programming language is designed for folks to create statistical tools that count the number of commas etc". This is hours after being introduced to R. To anyone even slightly familiar with the software, your characterization is obviously wrong. There have been several other such instances.
  - I'll also mention here your explanation of "joint probability" which serves as a pretext for rejecting the applicability of the notion entirely (>379 faktorovich:) -- written by someone who has no idea what they are talking about.
  - You assume (in >379 faktorovich:) that Darmon et al. simply combined multiple averages of individual tests in order to achieve the result they reported for their effect -- and when that calculation yields the wrong result you pretend that Darmon et al. are lying and falsifying their results instead of considering the possibility that you are incorrect and that the combined result was achieved via other means.
  - When you do not understand a graph or a statistical method, you claim they don't "make any rational sense" (>379 faktorovich:); other people might consider that the confusion lies in a lack of understanding on their part.
- This song and dance routine in >415 faktorovich: of Oh, I knew about R/Stylo long ago I was just testing you teehee that's what a good teacher does, dontchano (>409 faktorovich:), and this after-the-fact retconning of your inability to open a .txt file in a spreadsheet ("the Excel file glitch that I now recall I solved when I last received this type of Stylo data-set" >322 faktorovich:). I don't believe you. And even if it were true, that only means you're arguing in bad faith. Either you're lying to us and pretending to possess expertise you haven't got, or you're stringing us along in a game only you are playing. Neither behaviour should be rewarded with continued engagement. Either behaviour is grounds for distrusting other things you say.

423faktorovich
Dic 13, 2021, 11:44 pm

>422 Petroglyph: Petroglyph, step back from your latest email for a moment and contemplate what you have written here. This is an extensive and repetitive personal attack on my character that only pretends scientific content when it changes what I have actually argued across this thread into what you imagine I have argued, or adds things I never said without quotes that you claim I said. Why don't you stop these nonsensical circles around what you conceive to be my personal faults, and just focus on the math and the literature and the history that is at stake. I could address every point you raise here to show how it is false, but I have already addressed all of these points separately across this discussion, and other users have asked me to avoid repetition. But to summarize my main objections to falsehoods in your tirade: 1. If my conclusions are correct, I must assert that they are correct for the sake of scientific progress.

2. Across my Re-Attribution series, I prove that my conclusions are correct with overwhelming evidence, including proving that there are no "coincidental" matches in the 284 texts I tested from the Renaissance; I prove this absence of coincidences with other evidence related to these texts beyond the computational-linguistic findings.

3. I have applied only a handful of basic tests in a brief experiment on Twain/Dickens earlier in this thread; I applied 28 tests to the texts in the 18th century; I vary the combinations of tests slightly for corpuses in different centuries if there are more transcription errors etc. that some periods are more prone to. The 27-tests is a number that I applied to the Renaissance consistently, so that all 284 texts are compared on the same set of 27-tests. My ability to vary the number of tests, or insistence on any single number is another example of you trying to find a mental glitch in my thinking, instead of evaluating the scientific accuracy of my approach, and without actually looking at my body of research to judge if what you assume to be always true is actually true.

4. You are so eager to find some kind of fault with me personally that you state, "It's like you are unable (or unwilling) to see any other use for your method at all! (e.g. to detect single author signatures in a body of work you know to be by one individual.)" What? Now you see it as a fault if you are imagining other applications of my method that I have not anticipated? There are many applications of Einstein's theories he did not anticipate either. The idea of testing a set of texts to confirm they are by the known single author is nonsensical; there is no research question or mystery to solve, unless you doubt the group is actually by that author and thus ghostwritten, or you doubt some other question that is not stated in the simple phrasing you used to describe this non-experiment.

5. Then you suggest I accept "character strings" like "single characters", but "discard others (e.g.) character trigrams" as "non-computable". This contradicts what I said and what you are saying since a three-word phrase is a "trigram", and I measure it with my method. It is just irrational to compare these phrases between authors because of the uniqueness of these in individual texts; in contrast top 6 letters and top 6 words form only a couple dozen unique patterns even when I compared 284 different texts, so these patterns can be mathematically compared against each other as a significant percentage of them is likely to share the same patterns. In contrast, most phrases might appear only once among the top-6 in a single text, or they might appear in 10 different texts; it is irrational to give statistically consistent weight to all of these.

6. "You fundamentally distrust any kind of sampling technique." You make this sound like a disease. I explained that I use portions of texts such as scenes or poems when I need to test just that portion because scholars have questioned if it was written by a different author from the rest of the text. There are instances when testing a portion is meaningful. It is meaningless to break down all texts into pieces and performing tests on segments separately when these smaller portions are likely to lead to far less accurate results. I have explained this many times and you still do not understand what I am saying.

7. I previously explained the errors in the standard computational-linguistics methods, and you repeat that I have called them misleading and erroneous as if this means they are perfectly correct. I cannot correct your errors if you see my corrections as confirmation that you are always right.

8. You gave erroneous steps that could not lead to anybody applying them to get any meaningful results with Stylo/R. If you and the software developers of Stylo cannot give instructions on its use; the fault is not with the end-user. When asked for your solution to the problems faced, you refused to give any further directions because there was no solution for these glitches with this program.

9. None of the rival computational-linguistics methods I have reviewed that have been described in the various articles you have mentioned or those I have cited in Volumes 1-2 have proposed the complexity and clarity of analysis that is involved in my 27-tests method. Stylo's basic functioning only tests for words. The Darmon study focused on punctuation, alongside its combination with word-counts and other related measures that jointly comprise a small portion of the range of tests I applied. I exclude some punctuation marks because they lead to glitches in the Renaissance; Darmon explained this need to exclude some marks depending on the century studied, if you re-read that article.

10. "That is not how mathematics works"? The surrounding points you make do not refer to any mathematical concepts other than you suggestion that frequency is insignificant for attribution, when Stylo uses word-frequency as its main measure of attribution, and you have said and shown that you prefer Stylo. Thus, you contradict this assertion, and this whole argument falls apart... whatever it is you are trying to say here.

11. I have found specific statistical errors in Darmon and all of the other articles you have cited and described in detail. You have ignored all of the objections I have raised, and instead you are trying to find fault with my personality or my phraseology. This strategy is obviously needed since you have no actual response to my real objections to the errors in these "standard" computational-linguistic approaches. There is blatant data fabrication, falsification, presentation and miscalculation in these studies. You have not even attempted to answer any of the specific examples of this that I have raised.

I really hope others in this group will ask me other questions and will put aside repeating and recycling these same false objections you are trying to stress. You are focusing on these points because you have no rational reasons to reject my theories. And yet you have to insist my method is wrong because it proves currently accepted methods to be in error.

424bnielsen
Dic 14, 2021, 1:43 am

>421 Petroglyph: Thanks for that Calvino reference!

It reminds me of "Le littératron" by Robert Escarpit where a machine is capable of producing texts based on a loose specification (or if I remember right: a setting of dials on the front of the machine).

And the finding of unique words is something I do to my reviews, So here is the automatically generated list of unique words in my (rather incomplete) review of Litteratronen:

Litteratronen Escarpit elektroniks successivt litteratronen Poldavien perfid perfiditeten Capitoleum tarpejiske

I do it mostly to catch typo's but as stated by (the fictive character) Lotaria you can also sometimes get a feeling of the books contents. For non-fiction books I try to include a Table of Contents, so a book of recipes with only a few new words indicate a rather dull cookbook. :-)

425Keeline
Dic 14, 2021, 1:43 am

About 20 years ago I decided to spend some time to see if stylometric analysis could say anything interesting about the juvenile series books which have been my specialty since 1988. As I have stated, if it is not a majority, it is a significant plurality which are published under either personal or publisher-owned or packager-owned pseudonyms. For most of that period of time I have been adding and refining entries of a Series Book Encyclopedia. I have long been fascinated by this authorship puzzle and this emphasis is reflected in most of the entries of this forthcoming book.

Sometimes the authorship is not much of a mystery and the use of pen names was for the marketing convenience of the publishers. Sometimes the extrinsic evidence for the authorship appears in obscure newspaper articles or library archives like New York Public Library for the Stratemeyer Syndicate, the packager responsible for series like Nancy Drew. But there are many authorship mysteries remaining such as cases where a couple titles are attributed but others remain uncertain and this is where I wanted to see if stylometrics could provide any suggestions.

I found a technique that appealed to me and wrote a PHP program to implement it and its special kind of graphs. Its creators were successful in seeing this technique be admissible in evidence in British courts. Yet, it does not find favor with all because it is hard to understand why its tests work. Why would certain kinds of counts be meaningful at all? I decided to proceed with an open mind and see if it would fit the understood authorship and say anything about the unattributed works.

In order to find out if it could help with my inquiries. The co-creator of this technique provided some guidance on how to prepare texts for it, making an analysis plan, as well as interpreting the graphs.

To develop some familiarity with the technique, I started with works where the authorship was understood with high confidence. This included letters which have the least editing and outside influences. The technique had 10 types of counts and some were more likely to be useful for one author compared with others. Often there are three or four of the tests which seem to be good for an author.

This process is completed for other texts of greater or lesser confidence in attribution. The best tests are identified for each text. When there are best tests that two or more texts have in common. Then portions of two texts can be processed with the relevant tests. The generated graphs are evaluated.

Comparing a few dozen texts under this plan, most of the texts with high attribution confidence remained so. Some of the previously unknown attributions had at least some additional intrinsic evidence to see if it could be corroborated with extrinsic evidence.

In the series book world there are academics and fans. Sometimes members of either group are prone to believe that they can closely read a writer's texts and recognize whether an unknown or disputed text is or is not the work of that author they favor.

In one article on a series of medium length, an author of both groups wrote an article with his observations on the authorship of that series. He labeled three groupings of authors — P, A, and B. He had elements that each of the groupings had in common and it sounded plausible. This was written before the Stratemeyer Syndicate records were available. Once they were in the mid- to late-1990s, I discovered through contract releases and extant correspondence that all of the volumes in the series were attributed to the same ghostwriter, one of the Syndicate's more prolific ones but one without a large fan base.

One scarce story was attributed in one of the articles to a prolific ghostwriter. Few had a chance to read it. When it did, it seemed plausible that it could be by that writer. However, once the Stratemeyer Syndicate records were available, it became very clear that another less-prolific ghostwriter was responsible for it. The close reading had to give way to the extrinsic evidence that could not be explained away. No one expected these records to be seen by the public so there was no need to fabricate letters and release contracts for a little-known book that did not even get its announced sequel volume.

The idea of bringing computers to count elements that are not so easily seen by readers seemed appealing and it remains so. But any of these tests can also be subject to bias as well and it is important to recognize that. Stylometrics brings in another facet of observation but not absolute answers. It should be used in full context of the other evidence.

As mentioned before in this interesting thread, extraordinary claims require extraordinary and convincing evidence, particularly when they counter large bodies of evidence and a long period of scholarship.

James

426bnielsen
Modificato: Dic 14, 2021, 2:43 am

>425 Keeline: Nice. Thanks for sharing. The story shows how careful one needs to be with interpreting results from these tools.

427spiphany
Modificato: Dic 14, 2021, 4:17 am

I'm still feeling confused about the references to "translations" of seventeenth-century English texts.

Modernizing a text and providing a commentary is not "translation".

These texts were written in (early) Modern English. There are differences in vocabulary and above all spelling and punctuation from contemporary English, but they are written in a language which is essentially comprehensible to a contemporary reader with some practice and perhaps some help with the more unfamiliar word usage.

Certainly the language should be perfectly understandable to a scholar of the English Renaissance, and it should not be necessary for said scholar to first "translate" these works into a modern idiom in order to conclude that they are worthy of being made accessible to a wider public.

A table of contents is not a summary, by the way. Nor is simply pasting large extracts from your introduction. Again, I expect that anyone with your qualifications should know this.

One person's failure to figure out how to use software that hundreds of people have used successfully does not prove that the software does not work.

428MrAndrew
Dic 14, 2021, 9:28 am

I think i've found a new drinking game.

"Nonsensical": 31 times
"Overwhelming: 8 times
"27": 128 matches (including a surprising number of posts made at 27 minutes past the hour. Coincidence? I'm not convinced.)

429bnielsen
Modificato: Dic 14, 2021, 10:00 am

>428 MrAndrew: Ah, a text analysis of a discussion of a text analysis. It contains the 1-grams Nonsensical, Overwhelming and 27, so the identity of the ghostwriter is obvious.

430cpg
Dic 14, 2021, 10:36 am

>427 spiphany:

Definition I.4 of "translate" in the OED seems to allow translation of Early Modern English into Modern English. And Google seems to have found many references to "translations" of Shakespeare into modern English. For example, the website of the Oregon Shakespeare Festival has an article about "translating Shakespeare’s plays into contemporary modern English".

431MarthaJeanne
Dic 14, 2021, 11:03 am

But it seems to me that if I want to read these pieces, then I want to read them in the original, with notes and maybe a modernized version to help. I end up reading a lot of nonfiction in translation, because the library has it, but even that is not really satisfactory, and for literary texts, if I'm going to read them, I want to know what the author said.

432Keeline
Dic 14, 2021, 11:57 am

>431 MarthaJeanne: Earlier works still, like the stories by Chaucer or Le Morte d'Arthur and others do need some translation to modern language for many readers.

But what does this "translation" mean to an authorship attribution analysis?

Back in the early 1980s it was common to type in BASIC computer programs from magazines. Often these were in a dialect for a computer model different from your own and it was necessary to translate the source code to something your machine could run. The method to clear the screen varied a lot from one to another. This built the understanding of the language plus the programs usually worked unless they had some machine-specific trickery involved.

When it comes to foreign language translations, I am most familiar with the often poor work done in the 19th Century on Jules Verne texts to rush them to the U.K. and U.S. markets.

In one common translation of Twenty Thousand Leagues Under the Sea, an entire chapter describing the interior of the Nautilus is omitted and there are many errors throughout the rest of the translation. This bad translation is still commonly republished by publishers who don't know any better or just don't care to look for a better one.

In one U.K. translation of Journey to the Centre of the Earth there are major changes made to Verne's story, including whole scenes that were added by the anonymous translator.

Modern translations of Verne are more faithful to the author's work and, when possible, go back to the extant manuscripts and the periodical and early book editions.

But there are reasons why it is important to credit translators because their work does have an effect on what is read. So I would be very cautious about knowing the exact nature of any "translation" in an authorship study.

James

433norabelle414
Dic 14, 2021, 12:07 pm

>428 MrAndrew: Don't forget "glitch": 38 times

434andyl
Dic 14, 2021, 12:22 pm

>432 Keeline:

You don't even have to go back to the 19th century. Lem's Solaris is a case in point - it was not translated into English from the original but from a poor French translation. See https://www.theguardian.com/books/2011/jun/15/first-direct-translation-solaris

435faktorovich
Dic 14, 2021, 12:41 pm

>425 Keeline: Dear James: Thank you for sharing your perspective. It sounds like your "10 types of counts" technique is closer to my 27-tests method than most of the other methods out there. I have found it is important not to exclude any tests just because any given author does not have an easily identifiable pattern in some and not in other tests. For an experiment to be fair in a corpus, it is important to apply the same set of tests to the full set of texts without any bias towards any. Even if some tests might not seem to identify an author or show mismatches that contradict the findings of most of the other tests in the set, these seeming anomaly matches can be pointing to important generic, editorial or the like contributions/variants of the text. You also should avoid only processing "portions" of texts, and instead consider entire texts you are trying to attribute, as I explained earlier (due to size improving attribution accuracy). And you should not rely on graphs alone, but also consider the full set of raw data, or the results for all of the individual tests for a group of texts to see if there are attribution clues there that are not apparent from the summary graphs. And if there were some texts with a high degree of initial authorship-attribution confidence that did not re-affirm these bylines, these exceptions should have been more closely examined to see if more than one ghostwriter was writing or collaborating under seeming known single bylines. Intuitive attributions such as the distinction you note that was made between P, A, and B are entirely unreliable, and you should avoid giving these your attention as they can mislead you into seeing three styles where there might only be one or some other combination quantitatively. The types of elements found to be similar in any given group of texts should be considered in terms of it these elements represent generic or mimicable patterns, or if they are elements an author cannot consciously control; the latter is relevant in accurate attributions. The discovery that P, A, and B were all really one ghostwriter doesn't surprise me, as I have found this type of contradiction even in handwriting analysis of a Renaissance play like "Thomas More" that has been interpreted as having multiple handwriting styles by some intuitively, and only one handwriting style by others; and handwriting is less complicating than the fine points of written texts. Also the public records of an entity like Stratemeyer can have false information in them by omission, deliberate misleading, or because ghostwriters they hired used sub-ghostwriter(s), or re-contracted the work to somebody else. There are many possible scenarios, and more than the texts of a single publisher should be tested to establish true attributions. And reading the accounting books of a publisher that hires ghostwriters has to be done in a very skeptical manner, without just trusting the surface reading; I found this to be the case with "Henslowe's Diary" that has been broadly misinterpreted by scholars who have missed its real significance. While skepticism is necessary, finding documentary evidence or any other type of evidence to check a computational-linguistic finding is absolutely needed for it to stand up in court or in print. I have found this evidence and described it in the 14 books of my Re-Attribution of the British Renaissance series so far, with around 14 more books to come. Let me know if you need my help with your project(s).

436spiphany
Dic 14, 2021, 12:42 pm

>432 Keeline:: No, to be fair, she does say that the attribution was done using unmodernized texts.

But statements like the one below (from the blog post) make me wonder how the word "translation" is being used.

As I mentioned, none of William Percy’s plays or poetry, or the plays I am re-attributing to him in this part of the series, have ever been translated into accessible Modern English before. In the middle of my computational study, it became clear that the “Shakespeare” plays and poetry translated into Modern English registered as a separate linguistic signature from these same texts in their original spelling. In other words, editors have made such heavy changes to the canon of “Shakespeare” texts that this resulting style is a distinct linguistic signature or author.

I mean...there are Shakespeare versions and Shakespeare versions. From editions that merely update the orthography and punctuation to something more familiar to contemporary readers, to versions that completely recast Shakespeare's words into modern idiom, and probably everything in between.

(By the way, the fact that a test which relies on things like punctuation and word lists gives different results when using modernized vs. unmodernized versions of a text is exactly what I would expect to see. I'm not sure why this is being presented as a discovery; it's something that should be taken into account from the beginning in the methodology of a project of this sort.)

437faktorovich
Dic 14, 2021, 12:51 pm

>427 spiphany: The terms "translation" and "modernization" are used interchangeable when describing changing a text from Early Modern English to Modern English. You can check previous such translations to see both variants used in introductions. Volumes 3-14 of my series are also definitely translations because all of them also include content in other foreign languages including Latin, Italian, French etc., so at least those sections are translations, and the texts are incomprehensible without the various types of translation needed. If you want to check if you could understand one of these texts without my translation just read this first part of the Robin Hood trilogy that has not been recognized as such by past scholars: https://www.google.com/books/edition/Look_about_You/aIw4AQAAMAAJ?hl=en&gbpv=.... If you can read the whole thing cover-to-cover and know what happened, you are right no translation is needed; otherwise, this is why I made these first-ever translations of these important texts and annotated them with explanations about why these further re-affirm my attributions.

I posted the book summaries, not the introductions. I have tested all computational-linguistic software packages claimed to be free in the articles I read and none of them were functional/ accessible. They are designed to be inaccessible to users (including blocks where you need permission from the creator to use it), so that users instead pay programmers to use them for them. And this creates a distance between the literature researchers, and raw data, so that researchers have to rely on finished graphs from these programmers without access to what happened to get the seemingly simple attribution conclusions in the visuals they get and then have to re-use in their writeups. I have explained these errors in detail already; you have to stop running in circles on this point unless you can prove otherwise.

438faktorovich
Dic 14, 2021, 1:00 pm

>432 Keeline: I looked up every single word in the 12 volumes of translation I have released in this series. I checked them against all accessible dictionaries of Early Modern/ Middle English, as well as for contextual usage in other texts. I support some of the more complex translations of words in detailed annotations, where I describe how a given rare word was used elsewhere, its significance etc. I re-ordered many sentences to make them comply with modern grammatical rules, but I never deleted any words without checking for modern alternatives etc. I introduce each text with extremely detailed scholarly introductions, staging directions, extensive plot synopsis, and various other elements needed for further scholarly research into these previously inaccessible texts. The body of the texts themselves can be read for enjoyment without interruption due to the translation style I use, or alternatively a scholar can read the annotations, introductory matter etc. as well that make up around half of each of these scholarly editions. I do not know why you are guessing about what my translations include, when you can just ask me for a review copy of the series, and see for yourself.

439Keeline
Dic 14, 2021, 6:58 pm

>309 Petroglyph: One of my colleagues is an expert on Oz matters like I am about Stratemeyer. He has been doing it for much longer (60 years) than I have (33 years). I wrote him to ask about when the public could have become aware that The Royal Book of Oz was not merely based on a partial manuscript or notes left by Baum at the time of his passing but is entirely or nearly entirely the work of Ruth Plumly Thompson.

There is an article by Peter Hanff in the forthcoming Winter 2021 issue of The Baum Bugle which is the long-running magazine for the International Wizard of Oz Club. Peter shared the article with me and it was a funny coincidence in timing that he had just written about the transition between Baum and Thompson for this series.

Summarizing the 6-page article, he used letters to the publishers to document some of the troubles that Baum had financially, causing him to declare bankruptcy. The financial needs of this and health issues provided a requirement to write more and do more of what the publisher wanted. After needing gallbladder surgery, he wrote ahead and had a couple manuscripts completed for the publisher's annual Oz book. He died in May 1919 and two more Oz books of his were published in 1919 and 1920. The Royal Book of Oz was published in 1921. The article includes the contract agreement between Thompson and the publisher for this book but says nothing about how much she received to work on. She had already been a writer in Philadelphia working for newspapers, etc.

The publisher announcements and ads of the time of the release set up the notion that the new book was based on some material Baum had left. It is not clear how much this might be at this point.

Years later a January 1934 newspaper interview included the lines:

"When Mr. Baum died in 1919, leaving the uncompleted manuscript for 'The Royal Book of Oz,' his publishers decided that my newspaper page introduced a fairy land in the same spirit as the land of Oz, so they asked me to work over Mr. Baum's notes to see what I could do."

Baum might have left only a page or two of summary ideas of what he planned to do with the story.

This sort of thing is far from unprecedented. "Oliver Optic" (William T. Adams) died before he could complete his Blue and Gray on Land series. Edward Stratemeyer was called upon to write the story. Here, too, the publisher wanted to make the connection with the original author. But the available correspondence, which is less than one would hope, suggests that Stratemeyer created his own outline and story and only referred to the cast of characters and previous volumes to use as a basis.

The juvenile series book world has other examples of other writers continuing the series begun when a writer dies. Another of Baum's series which was published under a pseudonym was the Mary Jane series. He wrote some volumes before his death and it was continued by another writer, Emma Speed Sampson. She also continued writings for two other series which were begun by her sister and another writer. Some of these were under the established pen names and some were under her own name.

I know of cases like the Camp Fire Girls books by Irene Elliott Benson where the named writer did two volumes and died. The series was continued by a male ghostwriter until it filled out to six volumes. These were published with the Benson name but the latter four were under a real author's name as a pseudonym — the modern example of ghostwriting for celebrity books. The six books were reissued under yet another pen name, as occurred in this sort of series publication.

Evelyn Hunt Raymond started a "Dorothy Chester" series for Stratemeyer from his outlines. After difficulty with his publisher, he pulled his books and took them to another publisher (lawsuits were involved). Raymond wrote new "Dorothy" titles for the old publisher but she died a couple years later. A male ghostwriter continued the series. Eventually Stratemeyer sold the two volumes he owned to the later reprint firm and the series had the length it is now known by. Seeing which volumes were by Raymond and which were by the ghostwriter is one of those curiosities which won't make a difference to hardly anyone but is a possible application of stylometrics.

I could name half a dozen other interesting examples but know this reply post is already long.

James

440prosfilaes
Dic 14, 2021, 7:00 pm

To boil my complaints down to one factor, there's no verification, no grounding. You have some tools that give you a similarity factor between two works. You then say that based on that similarity factor, there were only six writers for this entire period. But you've never shown any evidence that that factor indicates that two works are written by the same author. Even if it is an accurate tool for that, you don't seem to done any studies to see what level of similarity indicates the same author; without checking that, you can't know whether you have one author, six, or three dozen. You've come with an amazing answer, but you've never established that your method justifies your results.

441faktorovich
Dic 14, 2021, 8:01 pm

>439 Keeline: All of these instances of new ghostwriters being hired after deaths to continue a series in the same general style can indicate that the real ghostwriter was not the person who died, though it could have been the person who picked up the effort. If I tested these texts with my method, if the quantitative style remained the same this would indeed indicate that the dead party(ies) were not the actual ghostwriters behind the given text(s).

442faktorovich
Dic 14, 2021, 8:09 pm

>440 prosfilaes: I have provided verification across this discussion. If what I have provided so far, you have to ask specific questions that would settle your mind on this matter. Or you can ask me to send the series to you for your review and you can see my full explanation of my re-attributions there. The 27-tests method established six linguistic-groups with 284 texts between them in the British Renaissance, which I have assigned to the six ghostwriters I name (Percy, Jonson, Verstegan, Sylvester, Byrd and Harvey) through close biographical research etc. There are only 6 writers because all of the tested 284 texts match one of these 6 groups without any exceptions. If there were more than 6 writers, some texts would not have matched any of these six groups and would have formed added groups of their own. By testing this enormous set of texts in this period, and more in other periods, I have firmly established the match-levels needed to establish single-authorship, collaborative authorship between two ghostwriters etc. The data I posted on GitHub shows the varied types of matches, and you can see for yourself what numbers of tests and what patterns of these matches led me to different attribution-assignments. Yes, I have established my method justifies my results. You should ask a clearer question on what is confusing you about your method, so that you cannot see I have justified it.

443andyl
Dic 15, 2021, 4:27 am

>442 faktorovich: "There are only 6 writers because all of the tested 284 texts match one of these 6 groups without any exceptions. If there were more than 6 writers, some texts would not have matched any of these six groups and would have formed added groups of their own."

You keep on saying that but I think all we are seeing is proof by over-vigorous assertion.

I think we need to see your proofs, your verification, that your computational methods work. That they both correctly identify all the works by an author as being from the same pen AND that they do not confuse writers - including those writers who are trying to emulate another by writing in the same style. Only once we have that do we need to look at biographical research.

The problem is that when you mentioned the 19th century writers you have done in this thread you lost the room. There is much biographical evidence for the Brontës being the authors of the work attributed to them which would refute your theory of a hidden 'ghost-writer' in that case and throw shade over your entire methodology.

444bnielsen
Modificato: Dic 15, 2021, 6:21 am

>432 Keeline: Thanks for mentioning Jules Verne. There'a a Danish abbreviated translation of Twenty Thousand Leagues Under the Sea that I consider better than the original :-)

But the same series of translations also has some examples of cutting material that are less good. The translator is not credited, so it might have been a good versus a bad translator. I think most translations from English to Danish tend to conserve the number of sentences. I.e. the sentences are translated one by one.

Running some of the tests on original versus translation could be fun.

ETA: I've got to share this too:
Colossus - The Greatest Secret in the History of Computing, https://www.youtube.com/watch?v=g2tMcMQqSbA

At first it doesn't have anything to do with our discussion here, but there's a fun bit about the German language (26 minutes in) because there are a lot of repeated characters in German (especially if your character set doesn't include the double-s character, so you have to write it as ss.)

445faktorovich
Dic 15, 2021, 11:33 am

>443 andyl: The GitHub page includes various types of evidence: https://github.com/faktorovich/Attribution. It includes diagrams like this one for each of the six ghostwriters:

I also added this file to GitHub today, https://github.com/faktorovich/Attribution/blob/0a8abc6a5abbe991366fbdd8dd1646b7..., that includes scans of handwriting samples used across Volumes 1-2 to show how my computational-linguistic attributions match the handwriting styles used in texts in these groups, including the forging of "William Shakespeare's" name sharing a handwriting in "John Shakespeare's" name etc.

The other data tables include all of the raw and processed data that led tot he final attribution conclusions, which you can double-check. As I explained before the writing habits and traits recorded by the 27-tests are not things any writer can imitate because there are 27 dimensions evaluated, and even one of them (like comma frequency) is not an element a writer can just change through conscious intention. The biographical research is only a part of Volumes 1-2. There are various types of ways in which I prove this case beyond-doubt across the series. If you are actually trying to read and evaluate my method, why haven't you asked for a review copy of the series?

An accurate author-attribution method does not care if the conclusions are distasteful, contrary to romantic beliefs about authorship, etc. An accurate attribution method simply reports what the data indicates about authorship. Arguing about the repercussions is for literary theorists, not for data-analysts. Rejecting an attribution methodology because you dislike its findings is like rejecting DNA analysis because it implicated a guy you were sure was a fantastically nice guy.

446SandraArdnas
Dic 15, 2021, 1:38 pm

>445 faktorovich: The data does NOT say anything whatsoever about authorship. It merely shows the level of similarity or lack thereof, and even that is limited to what you actually analyze at all. People have pointed this out a hundred times already in various ways, yet you still insist there's no element of subjectivity involved and no grounds to question or review the methodology. I'm dumbfounded honestly.

447spiphany
Dic 15, 2021, 2:58 pm

>30 faktorovich: So I read few of your reviews, mostly because I've been puzzling over how one could properly read and review 150 non-fiction volumes a year and have time to do any research of one's own, and also out of curiousity about how you engage with other people's scholarship, a lot of which, given the quantities being reviewed, is surely not in your specific area of expertise.

I have a couple of reactions:

First, you really have a bone to pick with statisticians, don't you? Not just the specific things that have been posted in this thread, but a blanket disagreement with everything that is being done in the field.

Second, the style of the reviews seems more suitable to a personal blog than a literary journal. You're mostly concerned with how the book relates to or is relevant (or not) for your own research project -- the reviews are not, on the whole, oriented towards helping other potential readers decide whether they want to read the book in question.

I'm coming to the conclusion that you're not really interested in discussion at all in this thread, at least not in the sense of a intellectual inquiry and mutual exchange of ideas, because nothing we (or anyone else) might say could possibly prompt you to reconsider even one iota of your findings. What you're interested in is convincing us of the rightness of your claims. And promoting your books along the way.

Sorry, but as much as I enjoy a good intellectual debate, it requires a certain openmindedness on both sides. I don't think that's likely to happen here.

(To those readers of this thread who recognized this from the beginning: feel free to laugh at my naivete.)

448lilithcat
Dic 15, 2021, 3:35 pm

>447 spiphany:

the style of the reviews seems more suitable to a personal blog than a literary journal

Considering that the PLJ is published by the company she founded and of which she is director and editor-in-chief, it pretty much is a “personal blog”.

449faktorovich
Dic 15, 2021, 4:02 pm

>447 spiphany: There are around 600 of my reviews available on EBSCO/ProQuest that are listed on my Google Scholar page: https://scholar.google.com/citations?user=dJD72pMAAAAJ&hl=en. 7 of these have been cited in scholarly articles/ books. I regularly receive "thank you" notes from top-scholars in the various fields I review, and one of these citations (https://scholar.google.com/citations?view_op=view_citation&hl=en&user=dJD72pMAAAAJ&citation_for_view=dJD72pMAAAAJ:kWvqk_afx_IC) includes an official acknowledgement for my help via my review in the book. All of my book reviews are available for free on the Anaphora website: https://anaphoraliterary.com/journals/plj/plj-excerpts/book-reviews-summer-2020. If you had found something specific that is objectionable in my reviews, you would have quoted the part that you are referring to, so I could clarify any potential misunderstanding.

"Statistician" and "computational-linguist" are two distinct categories, and I have never expressed any general "bone" with any job description. I have indeed repeatedly disagreed with specific errors I have found in past computational-linguistic studies I have analyzed, but I have expressed most of these explanations in my Re-Attribution series, and not commonly in book reviews in PLJ. I have picked the elements that I agree with in past studies, discarded the elements I disagree with, and developed my own approach that combines what I learned from reviewing the options; this is the standard process all researchers follow to invent any new method. It would indeed be absurd if anybody disagreed with the concept of statistics. I fully agree with statistics. I disagree with the errors in the misapplication or the misunderstanding of statistical concepts in other computational-linguists past studies.

All of my reviews end with my conclusion regarding who could vs not benefit from reading a given book. I have no idea what isolated review you could have found that does not explain a book's value to readers. And you have to choose if my reviews touch on a wide variety of fields, or if I only discuss points related to my own research, as both cannot be true.

My operation of a publishing company and two journals within it and my authorship of reviews and other content in these journals is the norm among high-output writers like Charles Dickens who wrote for/ edited/published "Household Words" (1850–1859) and "All the Year Round" (1858–1870). I could post extensive examples of Dickens' reviews vs. my own in PLJ to show how my style is standard based on my reading of Dickens', Poe's, and other canonical writers' reviews. I do not read pop reviews, so I do not know what ideas you have about what reviews are supposed to look like.

There are 14 volumes in my Re-Attribution series, which none of you have read. They touch on all of the points raised in this discussion, and I address them in great detail. I asked myself these various questions, then performed an enormous amount of research to reach the definitive conclusions about the Renaissance that I present in the first 14 books, with around 14 more books forthcoming. I have had discussions with top scholars in these fields over the past few years of my research, and I have previously addressed all of their concerns. There is no rational reason for me to change my mind in this discussion since I have done my research long before I published. Imagine if Einstein was doing a discussion about his theory of general relativity book, and you objected that you didn't get the sense he was changing his mind (even though you had not read his actual book, and were raising points he had already addressed in the book), and so you were planning to depart the conversation...

450MarthaJeanne
Modificato: Dic 15, 2021, 4:24 pm

You have certainly had plenty of opportunity to convince us that we ought to read your books. I, for one, have no desire to do so. There are many other authors who I would rather spend time with. Like Jane Austen, William Shakespeare, Charlotte Bronte ...

451susanbooks
Dic 15, 2021, 4:45 pm

>450 MarthaJeanne: Jane Austen, William Shakespeare, Charlotte Bronte ...

Let me save you some time: they were all written by the same 2 guys, so if you've read one, you've read 'em all. Why are there even English departments? This stuff is easy.

452faktorovich
Dic 15, 2021, 8:43 pm

>451 susanbooks: Indeed. The study of literature has focused for too long on the biographies of the "Great Authors". Instead, English departments should teach the formulas, the structure, the linguistics, and the other scientific elements of writing, using reading to explain its applications. The point of English departments should be creating Great Writers in each new class of students, and not teaching them that all great writing is in the past and belongs to the unreachable class of the "Famous".

453susanbooks
Dic 15, 2021, 9:57 pm

>452 faktorovich: not teaching them that all great writing is in the past and belongs to the unreachable class of the "Famous".

where on Earth did you ever get the idea this this has anything at all to do with English Departments? How do you explain all the authors' readings hosted by English Departments? The MFA programs? The "Contemporary ______" classes?

You have the strangest ideas about how the world works. It's as if you come from a parallel universe, where everything that is nonsensical here makes the only sense there.

454faktorovich
Dic 15, 2021, 11:08 pm

>453 susanbooks: All writers that graduate with an English degree should have attempted at least one play, sonnet collection etc., and they should know at least as much about the writing craft as "Shakespeare". The quantity of readings a writer does is not related to the quality of their writerly output. You asked how I would propose teaching literature if many of the "Great Names" were erased, and I proposed the solution that I believe is ideal. Contemporary literature is considerably worse than literature from a century ago, so there is a great need for improvement in this regard. You are the one imagining a parallel universe. I am firmly grounded in the realities of this universe.

455Petroglyph
Modificato: Dic 16, 2021, 12:27 am

This is going to be my final somewhat substantial contribution to this thread. I've put my conclusions first, so you can decide whether or not you want to read the full-length points; and there's an overarching conclusion at the bottom, which I'm content to let stand as my TL; DR for this entire thread.

Below I've listed a number of my concerns with Faktorovich's methodology. I and others have gone on at tedious length about several others concerns (though the tedious posts were definitely mine). The points of criticism I list below are (mostly) untouched on. For this comment I've gone through the relevant pages of Faktorovich (2021; hereafter Re-attribution) where she describes the methodology and the data handling procedures, as well as the spreadsheet she posted (>290 faktorovich:) for the Austen-Bronte-Corelli corpus (>252 Petroglyph:), noting some fundamental errors that, in my view at least, completely, totally and utterly invalidate all of her results. If any "correct" results are achieved with this method, it is by chance rather than design.

------------------------------------------------

Conclusions

From point 1 I conclude that the maximum possible author signatures that this "method" can detect is 27^2 = 729. By incompetence or by design, she's guaranteed that, however large her corpus may be (all the English-language books published in 2017; all the works of the Renaissance; all the works published between 1560 and 1926), this method can only distinguish 729 minimally different "author signatures", at the very most.

Points 3 & 4 lead me to conclude that, through her systematic, manual alteration of her test results, Faktorovich is taking great pains to ensure that each of her 27 tests will show a ~36% similarity across her corpus. Whether the data warrants it or not. This, of course, greatly inflates her chances of finding "similarities".

Point 5 is a very good reason for expressing scepticism about the entire approach, wholesale. Anyone who makes this kind of mistakes in maths should on no account be trusted with any kind of stats.

Point 6 reduces that theoretical number of 729 possible author signatures number even more: she is, of course, not identifying minimally different authors (i.e. a score of 13 vs a score of 14; or a score of 12 with some of the 27 tests vs a score of 12 with an entirely different set of tests). She is simply lumping large groups together. Even scores of 7,8, 9 and 10 have been used to argue for co-authorship re-attributions; and similarity scores of 20+ simply occur too few times for her to get spectacular results. This, obviously, both inflates the number of "similarities" this method "detects", and decreases the total number of distinct "author signatures" even more.

From 6, it is also possible to conclude that Faktorovich is basing her re-attribution of 1500s and 1600s authors quite often on a mere 37% of her improperly counted tests reporting these vastly inflated similarities. A 63% "dissimilarity rating" suffices for Faktorovich to conclude in favour of ghostwriting/collaboration.

If I did not yet have so many reasons to doubt Faktorovich's methods and results, these would surely be enough on their own to, frankly, torpedo any claim she makes on the basis of her methodology.

-------------------------------------------------------------------------

When Faktorovich has run her texts through a service like this, she copies the results. into a spreadsheet. Then, she manually changes all the percentages and absolute frequencies to ones and zeros. By quite literally deleting the actual figures provided by online text analyzers, and by replacing them with a binary system of 0 and 1, Faktorovich is making sure her method cannot detect more than 729 minimally different authors. There are only so many ways in which a series of 27 tests can return a result 1 (27 ways); or 2 (26 ways); ...; or 26(26 ways) or 27 (1 way). All the possible combinations of 27 numbers with only 2 outcomes per test means that the maximum possible number of outcomes is 27^2 = 729.
```
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 = 1
0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 = 1
...
1 0 1 0 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 1 0 1 0 0 = 14
1 1 1 0 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 1 0 1 1 0 = 14
...
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 = 27
```
From here on out, whatever Faktorovich says about her results, what she is talking about is the patterns she sees in these ones and zeros, and the original results from the online text analyzers have now well and truly been discarded.
The "similarity score" consists of adding up all the ones and zeroes for a particular work across a row (each column is a test). The work to be compared to has the maximal score (because it is identical with itself); the score of other works varies. Every single one of these 27 tests is weighted identically: a similarity in absolute comma frequency is counted as a 1, exactly the same value as the sentence length or as lexical density or % of emotional words. Some of these are calculated over the entire text; others over 100 sentences. Treating all of these measures as equivalent is an error that absolute beginners make. In >163 faktorovich: she says that her various tests have different measuring systems, and that they "cannot be compared to each other without changing them into the same types of measurements". Yeah, sure: turning punctuation marks per 100 sentences and also % passive over the entire text into ones and zeros will indeed technically count as "changing them into the same types of measurement". Whether it is useful, or, indeed, competent??
The ones and zeros that indicate similarities are assigned manually, and acccording to the following rule: For every text that she compares to a corpus, Faktorovich judges the 17/18% of texts above it in the spreadsheet and below it to be similar:
Then, I calculated how many texts would be within the 17-18%-proximity-range for the varied numbers of texts in the full corpus across the stages of this study. When there were only 116 texts, 20 of the closest texts were considered as matches; when the corpus expanded to 284 texts, 50 texts were counted as matches. Some of the other 17-18%-proximities were: 24 out of 132, 26 out of 154, 30 out of 172, 34 out of 196, and 46 out of 266. Whatever the number of texts were that counted as matches within the 17-18%-range, half of these were taken from above and half from below the test-output for a given compared-against output (when all outputs were organized from smallest to largest). (Re-attribution p. 34)

All of these proportions, of course, amount to ~17% and ~18%, and they do so by design. If a test result for some random text B lies within ~20 % (!!) of the text X she's currently looking at, she'll count it as similar enough to suspect co-authorship/ghostwriting, and she'll alter the actual test result to a "1". Everything else gets a "0". In other words: Faktorovich's data does not reveal that 40% of texts are "similar" for each test -- she is actively introducing a consistent percentage of "similarity matches" into her data. Here is a screenshot of her analysis of a text from the Austen-Bronte-Corelli corpus which I posted in >252 Petroglyph: and of which she posted her analysis in >290 faktorovich:. The yellow row that sums the columns was added by me. Note how every column has at least 4-5 "similarities" across 16 tests. 4/16 = 25%; 5/16 = 31%. This is not coincidental. This is intentional.
Behold the fundamental level at which Faktorovich is confused: she's not counting "anything within 17-18% above or below the test result value" as similar, she's counting as similar however many texts would be within 17-18% on either side given the size of her current corpus. And by doing the latter she thinks she's doing the former. She, apparently, genuinely thinks that by scoring ~36% of the texts of your corpus as "similar" to your reference text, you have scored the test results that lie within ~18% on either sideof the reference text.
Faktorovich is on record in >290 faktorovich: and >305 faktorovich: (and in Re-attribution p. 34) as claiming that a similarity score of 10+ is is broadly when she starts counting thinking about co-authorship. That is to say: if a work-to-be-compared has scored ten "similarities" out of 27 (37%), she'll count it as similar enough to base conclusions about co-authorship or ghost-writing on. Put differently: if seventeen of her 27 tests (63%) are rated "not similar", then that is enough for her to conclude co-authorship or ghost-writing. This is decidedly odd. Now why would she do that? Well, the answer is in her book:
The highest number of matches occurred in two 25/27 outputs for two nearly identical versions (both with modernized spelling, but alternatively edited) of 2 Henry VI and its alternatively-titled Two Famous Houses of York and Lancaster. There were no instances of matches between 20 and 24 tests. At 19/27, two “Thomas Nashe”-bylined and Verstegan-ghostwritten texts matched: Almond Parrot and Terrors Night. (...) In contrast, most of the texts in this corpus include some degree of editorial or writerly assistance from others. If a text matched any other text on over 13 tests, it tended to have a single dominant ghostwriter without equivalent matches to rivals’ groups. The majority of matches had lower match-levels between 10 and 13 tests. Texts with 8-10 matches were particularly likely to have been produced by two or more ghostwriters in collaboration. There were two matches at 18-tests: the two versions of the Percy-ghostwritten “Shakespeare” play 3 Henry VI and its alternately-named Richard Duke of York version, as well as Percy’s Tempest and Two Noble Kinsmen. Across all outputs, the median number of matching tests was 5; in other words, half of the outputs fell at or below 5/27-matches. The most common output was 4. The upper fence (above which results were technically “outliers”) was 10.5. Given these statistics, matches at 10 or greater were all outliers, or indicated very unusual and statistically significant similarities. (Re-attribution pp- 33-34)

So. Two works with similarity scores of 25 (The same work! Just slightly differently spelt!). Two works with a similarity score of 19. Two works with a similarity score of 18. What seems to have counted is are the 13-17 range and the 8-13 range. In other words: She uses similarity scores of 10+, because her method yields not enough scores of let's say 20+ to justify her re-attributions. If there aren't enough similarity scores that are actually high, she'll work with whatever is most frequent (even it it's only 37% similar by her own measuring!!!) to perform her re-attributions.

Whatever it is she is counting here, it is not ghost-writership, nor is it collaboration. It's a hopelessly confused mess, that is what it is.

-------------------------------------------------------------------------

Overarching conclusion:
Faktorovich either is so ignorant of what she is doing that she does not realize just how thoroughly she has made sure that her results will be wrong; or she is a 14-book-writing and a 14-book-yet-to-be-translated level troll, and this is a very long con. It's no wonder she's so quick with accusations of data manipulation and paranoia about not-her-methods and the like -- every accusation she levels at others (errors in statistics, looking at the wrong things, fabricating data & lying about results) is based on a series of her own fundamental misunderstandings of linguistics, statistics, data handling, computational linguistics, how to count percentages, etc. However, she's been working tirelessly at this method of hers for so many years. She's based 14 self-published books on it. She's planning on self-publishing at least 14 more. These errors have now taken root so deep that whenever actual statistics or maths or data management deviate from her erroneous methods, they look weird and threatening and wrong. Years of labouring under some pretty basic misunderstandings will do that to you.

Pity, really. Such dedication could have produced actual results elsewhere. But so it goes. Poo-tee-weet.

-------------------------------------------------------------------------
Reference:
Faktorovich, Anna. 2021. The Re-Attribution of the British Renaissance Corpus. British Renaissance Re-Attribution and Modernization Series 1–2. Quanah: Anaphora Literary Press.

456Petroglyph
Dic 16, 2021, 12:00 am

The back and forth with Faktorovich reminds me very much of a passage in The Gold-Bug, a short story by Edgar Allan Poe, in which the hunt for a pirate treasure involves the following steps:

1) lowering a weight to the ground through the eyesocket of a human skull nailed to a high tree branch and placing a peg where the weight touches the ground;
2) drawing a line from the closest point of the tree trunk to that peg;
3) and then continuing fifty feet along that line, which is where the treasure will be.

At first, the protagonists try the wrong eyesocket, and so they dig in the wrong place:

This mistake made a difference of about two inches and a half in the 'shot' --that is to say, in the position of the peg nearest the tree; and had the treasure been beneath the 'shot,' the error would have been of little moment; but the 'shot,' together with the nearest point of the tree, were merely two points for the establishment of a line of direction; of course the error, however trivial in the beginning, increased as we proceeded with the line, and by the time we had gone fifty feet, threw us quite off the scent.

Faktorovich has long since committed a couple of extremely basic errors in her maths/linguistics/data handling. And now she's too far gone to correct all that. About 14 books too far, actually.

457faktorovich
Dic 16, 2021, 12:59 am

>455 Petroglyph: 1. "the maximum possible author signatures that this "method" can detect is 27^2 = 729". This is pure nonsense. If a corpus has 10,000 texts in it, none of them might match each other on the 27-tests. This would indicate that there are 10,000 distinct authorial-signatures in this imagined corpus. Multiplying the number of tests by itself is irrelevant. The point that distinguishes 2 texts as similar is if they are within 18% of each other on the range on each of the tests, and then there is a degree to which they can be similar if they are proximate on many tests.

2. Another false bit of nonsense: "Faktorovich is taking great pains to ensure that each of her 27 tests will show a ~36% similarity across her corpus." What? In points 3&4 I explained that I don't always use 27 tests (sometimes 28, at other times a handful). And with 6 signatures in the Renaissance corpus, if they were evenly split this would leave around 17% for each group of similar texts per signature. Petroglyph has imagined a random "36%" and is accusing me of manipulating data, when there is no hint of this data point in my data or argument.

3. I did not make any mistakes in point 5. I explained the mistakes in the standard/Petroglyph's approach. Petroglyph is not even naming what this mysterious mistake is, as the point at this point is just to say general negative stuff without attempting to make rational sense.

4. Petroglyph does not understand my method as apparent from this line, "not identifying minimally different authors (i.e. a score of 13 vs a score of 14; or a score of 12 with some of the 27 tests vs a score of 12 with an entirely different set of tests)." Scores of 13 and 14 have equal weight in being very strong authorial-signature matches. The distinction between 13 and 14 does accurately indicate that one is a "minimally" different or stronger match than the other. In the formula I am using to determine similarity, yes, "12" in any combination of tests is given equal weight as any other combination; this is necessary to avoid placing unfair bias towards any one test or selecting some tests as weighty depending on the desired attribution. "She is simply lumping large groups together." This sentence is just nonsense. It has nothing to do with the previous sentence. "Even scores of 7,8, 9 and 10 have been used to argue for co-authorship re-attributions". Right. Folks in this group have been asking me about how a method can distinguish between editorial, co-writing credits and dominant single-author styles; these 7-9 tests record these various degree of collaboration between two or more authors. There is no error here in my method; this is part of why it works.

5. Changing data from different measuring systems into a binary system is the rational statistical approach. The alternative option Petroglyph seems to be supporting is to compare inches, to meters, to percentages to carrots.

6. I retain all of the original data. I do not "well and truly discard" it. If Petroglyph actually checked my files in GitHub, he would have seen that there is a table for the raw data, and then the binary tables, and then the final results tables and re-ordered result tables that show the patterns of the re-attributions. You can trace exactly what happened with the data, unlike in the various computational-linguistic methods that have been previously proposed (including those Petroglyph described) that indeed only show the final numbers and diagrams without the raw data that proceeded them, as it is indeed "well and truly discarded".

7. "The yellow row that sums the columns was added by me. Note how every column has at least 4-5 "similarities" across 16 tests. 4/16 = 25%; 5/16 = 31%. This is not coincidental. This is intentional." Here Petroglyph has invented a method of his own and is seeing strange patterns in it. He is adding the total number of matches on each of the tests for each of the texts. He finds that the most common number of matches-per-test is 4-5. Let's say the test is # of commas-per-100-sentences. This sum concludes that let's say 4 or 5 texts match on this particular measure; but this does not say which texts match, but rather only the percentage of them that have this element in common. This proves absolutely nothing, unless Petroglyph has misunderstood my method, or is measuring the wrong totals in the final row instead of the final column...

8. 10 matches out of 27 tests is a strong match because each of these matches multiplies the statistical significance of the match, instead of merely adding 1 point out of 27. 10 matches does not mean the 2 texts are 27% similar, but rather that the dice was tossed 10 times and at each toss the 2 die showed identical numbers; 1 match between the die is a coincidence, while the 10th match makes it very likely that the die are rigged to keep showing a match, or there is some other abnormality beyond random outcomes that are returning these repeat matches. If there are 284 texts in a corpus, 2 texts have to appear within 17-18% of each other and to beat the odds repeatedly. If, as I have explained, a corpus like the Renaissance has an extremely high degree of collaboration or co-writing, or re-mixing of texts into sonnet collections by multiple ghostwriters etc.; a 10-match to 2 different authors means (by your count) 10 + 10 match to these 2 authors being co-authors, for a 20 out of 27 total = 74%.

Petroglyph's ramblings have proven one thing, he absolutely has no knowledge of computational-linguistics, just as he has indicated might be the case since he has not disclosed his true identity or any actual credentials he might have in this discussion.

458Petroglyph
Dic 16, 2021, 2:17 am

Vota: Are you convinced by faktorovich's arguments?

Corrispondenza attuale: Sì 0, No 68

459Petroglyph
Dic 16, 2021, 2:17 am

Vota: Are you considering, or at least willing to consider, reading one of faktorovich's books?

Corrispondenza attuale: Sì 0, No 68

460Petroglyph
Dic 16, 2021, 2:17 am

Vota: Do you think it was a good idea of LibraryThing to spotlight this particular author?

Corrispondenza attuale: Sì 5, No 42, Incerto 21

461spiphany
Dic 16, 2021, 4:01 am

>449 faktorovich:
You're of course free to write whatever you choose in a review. Far be it from me to dictate what is right or wrong. I just said the reviews sparked a few reflections on my part, that's all.

No, I didn’t realistically think that a discussion with a handful of random individuals with various levels of expertise in diverse fields would lead you to completely discard a theory that you have been working on for years and poured so much of your heartblood into.

But .... my philosophy is that none of us ever has the final word on what is "true", because knowledge is always expanding and being altered. Perhaps not, most of the time, revolutionary insights or paragigm shifts, not in ways that fundamentally change our view of the world, but little brush strokes that illuminate or clarify or correct bits of the picture that were previously obscure or inaccurate or outside the frame of the canvas.

My experience is that a discussion -- as opposed to a monologue -- encourages participants to look at the topic from another person’s point of view, to consider things from an angle that might not previously have occurred to them. Maybe one finds that a puzzle piece needs to be moved, or removed, or added. Or reconsiders how to explain or approach something that is unclear to someone who is coming from a different starting point. Discussions thus often lead to unexpected insights.

(This pattern does not hold in interactions with evangelicals and other Holders of the Revealed Truth. They follow an agonistic model in which they have won from the outset; if the other person does not come around to their point of view, it is an indication of a failing of that person.)

This opportunity to see things through another person’s eyes is essentially why I read literature -- to share the privilege of visiting someone else’s world. In this sense, the story being told is more important to me than whose name the work is published under.

I can’t help thinking, from the things you’ve said here, that for a scholar of English literature you don’t actually seem to particularly like or enjoy it all that much. Books are merely a collection of facts to be mined and assessed for usefulness. It seems like rather a pity, given that the study of literature is a hard, hard way to make a living.

462EthanoMMG
Dic 16, 2021, 6:52 am

Questo utente è stato eliminato perché considerato spam.

463susanbooks
Modificato: Dic 16, 2021, 9:59 am

>454 faktorovich: 'All writers that graduate with an English degree should have attempted at least one play, sonnet collection etc., and they should know at least as much about the writing craft as "Shakespeare".'

I wonder where your PhD is from. You seem to have absolutely no idea what a Humanities degree is. English students learn the art of literary criticism -- of which you understand nothing. Fine Arts Literature students learn both criticism and creation. These are two distinctly different degrees within an English or Literature department. To say that every student must create a sonnet collection (!), one of the more difficult forms of poetry, is absurd. It's like saying an Art Historian must create as many canvases Picasso. Or a Historian must sign as many treaties, fight as many wars, write as many diaries, as the person she studies. Similarly, by your logic, no one should get an undergrad degree in Philosophy unless they've written as many effective/affective aphorisms as Nietzsche.

None of this is to say students don't also study the act of creation -- my poetry students are invited to write a sonnet instead of a multipage essay for their midterm or final. That's it, 14 lines & they've done the entire assignment. In the 15 years of my teaching that course, maybe 5 students have chosen this assignment, because we've studied sonnets together. We know what a sonnet is & what it requires. Sonnets are like algebra or geometry problems, requiring the same logic, the same transitive, associative, commutative properties as mathematical equations. One doesn't just whip up a collection. Or (hah!) demonstrate as much knowledge of playwrighting as the author(s) of the Shakespeare corpus.

Petroglyph has shown the considerable flaws in your statistical method. These others are just mistakes in understanding how a university works, what "studying" means.

Since you do claim to have a few English/Literature degrees, I look forward to reading a few sonnets from your collection.

464cpg
Dic 16, 2021, 10:16 am

>463 susanbooks:

I am having trouble Googling up how the associative property of mathematics applies to sonnets. Can you point me in the right direction?

465faktorovich
Dic 16, 2021, 10:23 am

>461 spiphany: The mythology of "William Shakespeare", "Christopher Marlowe", "Philip Sidney", "Mary Sidney" and the other "Great Authors" of the Renaissance is the fictional "historical" narrative that "evangelicals and other Holders of the Revealed Truth" have been propagating for 400 years. Members of such Believers groups never realize that they have put faith in a false mythology. In this context, saying that I view literature as "a collection of facts to be mined and assessed" is the greatest compliment, and proves that my side of this argument is not the one that is built on blind-faith.

466faktorovich
Dic 16, 2021, 10:35 am

>463 susanbooks: The term "English degree" includes both the Writing and the Literature curriculums, or both the MFA in creative writing, and the PhD in Philosophy of Literary Theory. I am absolutely saying that every college graduate should have created a sonnet collection, a play, a philosophical pamphlet, and texts in a range of other standard genres. No, the sonnet should not be an alternative to a 6-page paper; learning how to measure and execute sonnets should be a standard exercise in any class when they review the Renaissance and read famous sonnets from that period. And when reading a novel, students also dissect it to understand how to write a novel, and then attempt to write a chapter of their own novel. By continually lowering the bar of what college students should do, we are creating a disservice to the planet, as humanity becomes increasingly more dangerous the less it can understand. 12 years of free public education is an incredible gift, which is being misused by turning it into a sociability-contest, instead of a time for people to work full time to further human knowledge and not only to appear to be consuming it. I have written sonnets, poems, plays, novels, etc. etc. before. Before you ask me to send those to you, begin by reading my Re-Attribution series, which happens to include a sonnet collection in Volume 3.

467lilithcat
Dic 16, 2021, 11:51 am

>466 faktorovich:

So now you've moved from "all writers that graduate with an English degree" to "every college graduate"! Talk about moving the goalposts.

And while I certainly believe that college students in STEM fields should have a grounding in the humanities, to suggest that a pre-med student should have "created a sonnet collection, a play, a philosophical pamphlet, and texts in a range of other standard genres" is, frankly, absurd.

468paradoxosalpha
Modificato: Dic 16, 2021, 12:03 pm

Contemporary education sucks. People are allowed to graduate from high school without ever having applied for a patent, exhibited original sculptures in stone, or even read Hegel.

469susanbooks
Modificato: Dic 16, 2021, 1:25 pm

>464 cpg:

The same logic required to solve an equation is used in creating and reading sonnets, the only difference is sonnets use words instead of numbers.

Shakespearean (or English) sonnets have 14 lines of 10 syllables each, divided into 3 quatrains & one rhyming couplet. Each quatrain has a specific purpose: quatrain one sets a question ("Shall I compare thee to a summer's day?"); quatrain 2 develops it into an even more difficult problem; quatrain 3 begins the resolution or complicates the problem further; the couplet is like a solution in mathematics, resolving the original question & its complications without necessarily providing a happy ending.

Each quatrain develops from the other. Word choice, alliteration, allusion, metaphor are like pieces of a math equation in that each one, combined with the rest, multiplies the problem (or resolves it). Each metaphor (love as money, for instance, a common one for Shakespeare) is like x in an equation. As the metaphor is developed x becomes 2x (the problem gets worse) or 3x or whatever. The other elements behave similarly, all combining to produce a linguistic equation that runs along the same lines as any algebraic or geometric formula.*1 In a manner of speaking, (quatrain 1 + quatrain 2)quatrain 3 = resolution/couplet

Elizabethan courtiers used to write sonnets to pass the time and engaged in sonnet contests. There were countless Renaissance sonneteers. The reason comparably few sonnets survive is bc most were bad; it's incredibly difficult to get the equation to come out right, there are so many variables (syllable count, line count, development of metaphor, effectiveness of alliteration, aptness of allusion, etc).

In answer to your original question, Wikipedia says, "In propositional logic, associativity is a valid rule of replacement for expressions in logical proofs." In a sonnet the associative property applies to metaphors, alliteration and other literary devices; if love = money (metaphorically) in quatrain 1, then you can simply talk about money in quatrain 2 & the reader knows you're also talking about love. If the sustained "oh" sounds in a quatrain echo as moans (as in Shakespeare's Sonnet 30: "And heavily from woe to woe tell o'er"), then further "oh" sounds or references to noises of distress, associatively = the original meaning and use of "oh," times whatever has been added/changed.*2

*1 Sonnets are so similar to equations that quatrains were often referred to as rooms, with the full sonnet functioning as a building. As in architecture, each piece must be perfectly balanced to hold up the next. Thus Donne writes in "The Good Morrow," "For love, all love of other sights controls,/ And makes one little room an everywhere." He's promising his beloved that his poetry (a quatrain/room) can at once contain their love and allow it to expand beyond physical borders, while also playing with the idea that two lovers in one room need nothing else; their love is the universe.

*2 Wikipedia offers these examples of the associative property:
((ab)c)d
(ab)(cd)
(a(bc))d
(a(bc))d
a((bc)d)
a((bc)d)
a(b(cd))
a(b(cd))

Removing d (since there are only 3 quatrains, so quatrain a, quatrain b, & quatrain c), one can say, as I did above, that (ab)c = couplet. But just as easily, I might have said a(bc), or quatrain one's problem times the sum of the multiplication of quatrains 2 & 3 = couplet, etc.

470Keeline
Modificato: Dic 16, 2021, 12:53 pm

>441 faktorovich:

: All of these instances of new ghostwriters being hired after deaths to continue a series in the same general style can indicate that the real ghostwriter was not the person who died, though it could have been the person who picked up the effort. If I tested these texts with my method, if the quantitative style remained the same this would indeed indicate that the dead party(ies) were not the actual ghostwriters behind the given text(s).

I wanted to let this sink in a bit before replying to it. You have not seen these texts (though some could be found online) but you are ready to make a conclusion of how your system would assess them? That sounds like the conclusion precedes the analysis.

A common thread in stylometric studies is to apply mathematics, specifically statistics, and apply a scientific method to creative works in the humanities. This is most often done by identifying parts of interest and counting them and making comparisons with other examples to see if any patterns emerge that could be interesting to discuss.

Indeed you have made many references to famous scientists along the way in this thread that is approaching 500 replies in a short period of time.

Of course any one of the science disciplines requires many years to master but there are some foundational principles.

For example, whenever possible, a scientist tries to make a hypothesis and then design and conduct experiments that test one variable at a time to see if it is relevant to the question and the deeper understanding of that corner of the universe.

Part of this is also achieved with a control which is compared to the experimental subject to note the differences.

In taking 27 specific tests (or occasionally 26 or 28 based on what you wrote recently) what efforts have been made to isolate those tests to see that they are individually relevant?

Forecasting the weather is a complicated (and often wrong) area of science dealing with many variables in a complex system. Still the people in that field have to assess each element being measured to determine if they are significant contributors to the whole system. Which items are sources of change and which ones are the result of the change? At some point the air pressure was found to be a leading indicator of weather in the next day or so. The color of the sky — blue to gray — is usually an after effect of whatever the weather is at that moment and is not thought to affect the future weather. Factors that can be measured like temperature, humidity, air pressure, local acceleration due to gravity, UV light intensity and wavelength that reaches the ground, wind speed and direction, and others are all considered. Chances are there are many other things that could be measured but they would need to be evaluated to see if they have an effect on the system as a whole.

I am not clear at this point that the 27 counts you apply are specifically relevant to an authorship attribution. Some might be equivalent to the local acceleration due to gravity where it can be measured but might have nothing to do with the weather. Some might be the result of the way that English is written to be understood. For example there is an order for adjectives that makes sense to readers or listeners. If you mess with the order, it becomes jarring. It is described on this page and others. Here is the list you will find there:

1. Quantity or number
2. Quality or opinion
3. Size
4. Age
5. Shape
6. Color
7. Proper adjective (often nationality, other place of origin, or material)
8. Purpose or qualifier

So this pattern is a result of the way the language had evolved. It might seem to be a style (conscious choice or unconscious pattern) but it is really just something that helps with clear communication.

The question of the authorship of Camp Fire Girl stories published under the "Irene Elliott Benson" name has not and will never be a burning question (pardon the pun) in the field of children's literature. I brought it up because the example shows how sometimes a real person's name is used as a pseudonym.

As I mentioned, there are other cases in this field which can be interesting experiments for stylometric study. But I see it as an interesting facet and not the sole focus of the study.

James

471susanbooks
Dic 16, 2021, 1:42 pm

I just showed my partner this thread. Said my partner: This is like arguing with an anti-vaxxer. They have no knowledge of science & infinite confidence in their ignorance.

472faktorovich
Dic 16, 2021, 4:54 pm

>470 Keeline: I offered an educated guess regarding your corpus based on my previous studies of texts written after an "author's" death. My guess(es) were not meant to be even a hypothesis designed for future study, but rather merely a guess that was part of a casual online discussion. Theorizing regarding likely outcomes of a study before or instead of carrying it out is an essential part of science; such theorizing can lead one to conclude running an experiment on a given question is unnecessary, or is not likely to lead to revealing or significant discoveries. It is also important for all scientists to imagine how experiments might go as a general practice, as they consider thousands of possible experiments and their outcomes and contemplate what these various outcomes would imply. Telling a researcher they have made an error by offering their stated guess, is like telling scientists to stop hypothesizing about the nature of black holes until they can dive into one and find out what happens. I have been explaining across this thread that problems enter the process of such guessing when the researcher starts with the assumption (something accepted to be true without proof of fact; this is very different from a guess (estimate of something without knowing the answer and without claiming it as a fact) that they know who wrote some of the texts in a given corpus because of bylines or previous re-attributions; assumptions lead to biased re-attributions of texts because they are pre-grouped as if the bylines are true before the experiment begins; guesses do not set any bylines, but rather speculate on what the bylines can be to avoid errors before starting an experiment.

Yes, I have written sections in Volumes 1-2 that explain patterns and findings from each of the 27 individual tests. All of them are occasionally enough to accurately distinguish authorial-attribution, but because they are also occasionally individually wrong, only a combination of different tests can present accurate results consistently.

I do not measure the order of words. Grammatical word-order was different in the Renaissance vs. today. I spend several chapters in Volumes 1-2 explaining why these 27 tests are significant and why they represent elements that are constant in writers' unconscious habits. Please clarify what you are trying to ask. Are you saying you don't understand who the rate of exclamation-usage can be relatively constant for any one writer vs. another writer?

473faktorovich
Dic 16, 2021, 4:57 pm

>471 susanbooks: Yes, I have felt across this discussion that you, SusanBooks, and others are arguing as if you are anti-vaxxers who "have no knowledge of science & infinite confidence in their ignorance." I am glad your partner saw this as well. Denying the scientific findings of the computational-linguistic method I invented, and the myriad of other types of proof (including dozens of confirming handwriting-matches) I have been presenting is truly anti-scientific and ignorant.

474Petroglyph
Modificato: Dic 16, 2021, 6:23 pm

>471 susanbooks:
In all such discussions (with antivaxxers, young-earth creationists (or "cdesign proponentsists"), flat-earthers, climate change deniers, or, as in this case, an anti-Stratfordian) I think it is useful to keep Brandolini's law in mind (a.k.a. the bullshit asymmetry principle). I don't think, however, that we should let such unscientific nonsense pass unopposed.

475susanbooks
Dic 16, 2021, 6:33 pm

>474 Petroglyph: "I don't think, however, that we should let such unscientific nonsense pass unopposed."

Agreed!

476amanda4242
Dic 16, 2021, 8:35 pm

>468 paradoxosalpha: I blame low admissions standards. Imagine wasting all those resources on people who can't recite Virgil in Latin!.

477faktorovich
Dic 16, 2021, 8:41 pm

>474 Petroglyph: Pro-Stratfordians have been posting the "unscientific nonsense" in this thread that I indeed could not let "pass unopposed". It has taken a great deal of energy on my part so far to refute the piles coming at me; but I write around 5,000-words-per-day on-average even when I am not fighting against the pile, so I have much more energy to give to this cause.

478faktorovich
Dic 16, 2021, 8:45 pm

>476 amanda4242: The resources being wasted in the US are in the trillions of dollars in student loans these accepted students take out that they cannot repay afterwards because the low-standards of learning in college fail to prepare them to invent, to create, and to find humanity-furthering employment (self-employment and self-sustainment being the ideal for a satisfying life).

479paradoxosalpha
Dic 16, 2021, 9:24 pm

>473 faktorovich:, >477 faktorovich:

"I know you are, but what am I?" I'm starting to find crunchy un-popped kernels at the bottom of the barrel.

480SandraArdnas
Dic 16, 2021, 9:26 pm

I am completely and utterly amazed at willful blindness of the author. Perhaps it bears pointing out that in case of scholarly publishing the maxim that any publicity is good publicity is not true and this thread has in fact served to drive potential readers away. I'd frankly cut my losses at this point, instead of pushing on with unfounded superiority attitude.

481faktorovich
Dic 16, 2021, 10:59 pm

>480 SandraArdnas: Any rational scholar who finds this thread and reads it with an unbiased perspective will agree with the overwhelming proof I have presented. If nobody of rational-mind finds this thread; then, no sales or the spread-of-knowledge will follow. I just finished reviewing Seneca's letters: "Retreat into yourself, then, as much as you can. Spend your time with those who will improve you; extend a welcome to those you can improve.” Seneca adds that attempts to “display… talent” to the public in “debate” can fail if the intellectual “merchandise” is not “suitable for this populace; as it is, there is nobody capable of understanding you. Perhaps somebody or other will show up, and even that one will need to be instructed, to teach him how to understand you.” If no student shows up, I am going to continue my studies and writing after this temporary attempt to share what I have learned; half of the Renaissance series remains to be written.

482bnielsen
Modificato: Dic 17, 2021, 12:45 am

Surely no true Scotsman will fail to see who's right and who's wrong in this discussion.

http://www.valkyriearms.com/articles/thirtyeight%20dishonest%20argument%20tricks...

483SandraArdnas
Dic 17, 2021, 7:01 am

>481 faktorovich: The poll shows you clearly what any scholar would say. I'd heed it, but that's just me, I'm averse ti willful blindness

484MrAndrew
Dic 17, 2021, 7:23 am

>469 susanbooks: Thank you, that was fascinating.

485MarthaJeanne
Dic 17, 2021, 7:38 am

I rather doubt that learning to write sonnets would be in any way useful to current students. Even those few who want to become Elizabethan courtiers will find it less useful in the times of Elizabeth II than it was in Elizabeth I's.

Learning how to express opinions and research findings in a way that might convince and interest other people would probably be no easier, but very much more useful.

486spiphany
Dic 17, 2021, 7:39 am

>465 faktorovich:
It's funny, but I've seen little to nothing in this thread that would indicate the presence of these diehard supporters of the "mythology of 'William Shakespeare,' 'Christopher Marlowe,' 'Philip Sidney', 'Mary Sidney' and the other 'Great Authors' of the Renaissance" who refuse to accept your work because it would destroy a treasured belief.

In fact, the authors have hardly been mentioned at all, except in the context of questions about how you have taken into account period-specific aspects of their writing like spelling fluidity and editorial influence on things like punctuation -- something that is relevant regardless of who the authors are and whether they turn out to be the person the texts have been traditionally attributed to.

Nobody has said that it's unthinkable anyone could have written Shakespeare except for Shakespeare himself, because he is such an unparalleled genius, or because the literary establishment couldn't possibly be wrong, or any other arguments that suggest we are upset by your iconoclasm.

Rather, we've been asking about your logic and your methodology, which, if they are all you claim, should be applicable to any set of authors. Hence, we've discussed a range of other cases where it is known that ghostwriting or co-writing took place, and different techniques for disentangling the authorship thereof, and the attendant challenges, and whether your methods would work here. Those of us with more statistical training than I have been asking about your math, which does not care whether it's being applied to Shakespeare or to John Smith's e-mails.

Now, the fact that Shakespeare and Austen and the Brontes have been studied so extensively does mean one thing: that the evidence supporting your claims has to be all the more compelling because it goes against a lot of cumulative knowledge that so far has tended to support a different conclusion.

One reason some of us have been skeptical of your claims is because stylometry doesn't seem like a robust enough tool to provide that kind of compelling evidence. Again, not because of whose authorship you are challenging, but because there are so many reasons why certain linguistic markers in a text might reveal similiarities.

As I said, a discovery that Shakespeare had not written all or some of his texts would not fundamentally alter my opinion of his plays, because I'm primarily interested in how they work on stage and how they speak to a variety of aspects of human experience. That's not to say that authorship is entirely irrelevant, but I'm not about to put Shakespeare up on a pedestal and I recognize that the image of Shakespeare in the popular imagination is itself a construction that doesn't necessarily resemble the real historical person. I'm OK with that. As a classicist, I'm perfectly comfortable with the fact that numerous ancient texts were traditionally assigned to an author who probably didn't write them (pseudepigrapha), or that the poet who composed two of the most influential works in Westen literature is almost certainly mostly a fiction.

>471 susanbooks:
I wasn't going to go there, since the response is exactly what I expected, but I will note a few other features of the rhetoric used by anti-vaxxers and their ilk:
- the conviction that the mainstream/the establishment are wrong and engaged in some kind of coordinated action to cover up the truth
- the suggestion that those who accept the existing concensus among experts/scholars have been brainwashed or are "sheep" who are blindly following authority
- they see themselves as "free-thinkers" who are the only ones brave enough to recognize the truth and challenge accepted ideas; they feel that they are a minority being unfairly persecuted because said establishment feels that they are a theat. Disagreement with their ideas is interpreted as confirmation of their correctness.
- the claims usually contain some kernal of truth, which has been taken out of context, misunderstood, or otherwise applied in a dubious fashion to make the facts fit the theory

487spiphany
Dic 17, 2021, 8:56 am

>486 spiphany:
(For those of you who might be wondering whether I've completely gone off the deep end and bought into some postmodern idea about the death of the author and works' meaning being so timeless that we can ignore their historical context -- I'm not claiming that authorship doesn't matter. Such questions do, in fact, matter a great deal in the case of works where their authenticity (or lack thereof) are a key aspect of how we read and understand them. Any reading of, say, Sylvia Plath's Bell Jar is going to be colored by what we know of the author's biography, and if someone were to reattribute her work to a middle-aged white male ghostwriter who comfortably lived out his days in a suburb somewhere, this would indeed have some pretty dramatic implications for how we understand the book. But the plays attributed to Shakespeare draw heavily on existing plots and material, as was typical of literature of the time, and their primary purpose isn't to process the author's personal experiences. So, while attributing these plays to some contemporary of Shakespeare instead of to Shakespeare himself might change how we think about some of the details in the plays, it wouldn't necessary require a radical revision of how we understand the themes explored in, for example, King Lear.)

488norabelle414
Dic 17, 2021, 9:03 am

>480 SandraArdnas: any publicity is good publicity is not true

Unfortunately I think it is true in this case. None of us would have ever known she existed if LibraryThing had not chosen to publish an interview with her, and now several people in this thread have read her work and are trying fruitlessly to engage with it in good faith instead of letting her languish in obscurity as such utterly unfounded claims from a belligerent personality should. There are a hundred more eyes on her now than there were before the interview, and that's a hundred more than the work deserves. This kind of attention can always be spun into a positive, especially by someone who already thinks their work is invalidating hundreds of years of scholarship.

489SandraArdnas
Dic 17, 2021, 11:00 am

>488 norabelle414: But zero people in this thread would ever consider reading the work (hanks Petroglyph for the poll), so the thread itself is hurting rather than helping attract readers. Not sure about the interview. I've first stumbled on the thread and it hasn't enticed me to read the interview, let alone the books.

490SPRankin
Modificato: Dic 17, 2021, 11:50 am

>488 norabelle414: I could not agree more. Though I suspect that your estimate of a hundred eyes should be revised upwards. I imagine I haven't been the only lurker!

And yet...the whole experience has been so utterly engrossing from start to finish (and I highly recommend a minute or two of light online sleuthing) I'd be a hypocrite if I said I haven't thoroughly enjoyed every word. I bless the name of the friend who shared it with me. Also, I am in genuine awe of the patience, forbearance, and good humor of the people in this discussion (is that the right word?) who have given the author the immense gift of taking her work seriously.

I say this fully expecting "...utterly engrossing from start to finish...enjoyed every word" to end up as a blurb on a satisfyingly garish book cover.

491faktorovich
Dic 17, 2021, 1:39 pm

>485 MarthaJeanne: Learning only how to write a sonnet is indeed useless today. Learning how to write in the various structures of writing that include the sonnet, the short story, the philosophical pamphlet, the play, and other complex formulas is what can give students the ability to think creatively and to write in any unique structure that might become relevant and useful for them later in their life. The point of complex, structured writing and research into the science of literary creation is to learn that progress in human understanding is made by bending language into new structures and new ideas.

492faktorovich
Dic 17, 2021, 1:57 pm

>486 spiphany: The "math" makes up only a few chapters in the 14 books in just my Renaissance series. The 12 books of translation include 11 plays written/ghostwritten by William Percy, the main tragedian behind the "William Shakespeare" byline. If you are "interested" in how "Shakespeare's" plays "speak to a variety of aspects of human experience"; the rational thing for you to do would be to ask me for free pdf review copies of the 14 books, so you can compare my first quarto or your first folio translation of "Hamlet" with the 10 other "Shakespeare" books on if they "speak" differently or in a similar manner about the "human experience". Theorizing about your passionate for "Shakespeare's" plays without reading the annotations/ introductions that explain their similarity in terms of words only used by Percy etc., and the translated versions of these never-before-translated "Shakespeare" plays is illogical.

Your new attempts to position me as a "conspiracy theorist" show that you are the one with passionate extremist views that you are intensely defending. You are the one imagining that the "establishment" could be engaged in a "coordinated action to cover up the truth". My books are simply stating what the truth is, and that the establishment has been wrong or has been stating falsehoods in this regard. By insisting that my argument should not be heard, you are in fact attempting the "cover up" you have imagined, instead of just putting the two theories side-by-side and considering who is right in a climate where both are allowed to speak without suppression. I did not suggest "brain-washing" by those who believe in "Shakespeare", but rather I have said that they have not considered that the evidence (even without my computational-linguistic findings) has long ago proven the "Shakespeareans" to be wrong; I doubt anybody has washed the "Shakespeareans'" brains - they simply have not read the volume of evidence against this position that I have read in my research for this series. I doubt you can argue that I am in the "majority" among those speaking in this thread, since all of you have argued against my position; so it is a fact that I am in the "minority" (at least here) in the argument I am making. The "Shakespeareans" only base their position on "some kernal of truth" - the six forged "Shakespeare" signatures being the only bit of evidence that this was a real person - and have taken these out of the context of this handwriting matching other signatures from this period, etc., to draw the currently widely accepted mythology of "Shakespeare". How can you refuse to review my evidence for free in my series, and still insist that you are on the side of rational research?

493faktorovich
Dic 17, 2021, 2:09 pm

>487 spiphany: In fact, it would turn how we perceive "Shakespeare" on its head. Briefly: 1. William Percy's father (8th Earl of Northumberland) was assassinated while he was imprisoned in the Tower of London over accusations of rebellion (and for sending his sons to study in Paris) in 1585 (when William started writing plays), 2. Percy's uncle (7th Earl) had been executed over the Rising of the North in 1572 (when William was a youth), 3. Percy's brother (9th Earl) was imprisoned for decades in the Tower starting a year after the Workshop helped James I united the Scottish and English crowns and take the throne (without being next in line) by ghostwriting books attributed to James I (despite James not knowing any English before his move to London to take the crown). William Percy's brothers (including the Earl and other brothers in this large gamily) invested in William Percy's plays (these investments are documented as I explain in the series), and their goal was to showcase the evils of the monarchy by showing good kings such as Lear being harassed by power-hungry children, while bad kings such as Macbeth ruthlessly fight for power. If "William Shakespeare" wrote these plays, he was a country-actor without any personal stake in the actions of kings and queens; if William Percy wrote them, he had a political revenge motive to right the myriad of wrongs his family endured. These are just a few examples I cite in 14 books of various types of evidence.

494SandraArdnas
Modificato: Dic 17, 2021, 2:16 pm

>492 faktorovich: How can you refuse to review my evidence for free in my series, and still insist that you are on the side of rational research? The irony here is painful. You gloss over anyone's objections as if they were never uttered and just parrot your thesis, yet you expect people to immerse themselves in the tomes of your work, which by the way you can't even properly summarize nor can you convince anyone of the validity of your method, among other things because you don't address the issues raised.

Doggedness alone cannot make people respect your work and you can't strongarm anyone into reading it by insinuating that anyone on the side of rational research would. Such perfidy. You actually need to convince people you're worth their time, not the other way around. So far, you've failed miserably in that respect. Try better

495.mau.
Dic 17, 2021, 3:27 pm

>488 norabelle414: actually I read a former book of Dr Faktorovich, which was offered in ER in 2017 :-)

496paradoxosalpha
Dic 17, 2021, 3:38 pm

>495 .mau.:
Ah, but clearly not on the "strength" of the present discussion.

497.mau.
Dic 17, 2021, 3:47 pm

>496 paradoxosalpha:
let's say I learned a lot of interesting things by reading this thread (I have little to none training in linguistics, but since I have a degree in maths I could follow the discussion)

498spiphany
Dic 17, 2021, 4:22 pm

>492 faktorovich:
Oh, do pray tell, what fringe conspiracy have I been pushing in this thread?

And where have I insisted that your argument "should not be heard"? Indeed, I and many others have been engaging in this discussion, asking for clarifications and generally giving you a chance to present your case.

The fact that we are not convinced by it is not at all the same thing as trying to silence you.

By the way, my comments about the rhetoric of anti-vaxxers were just that -- about anti-vaxxers. I leave it to readers to decide for themselves whether similar strategies can be identified in any part of this thread. If it doesn't apply to you, wonderful: I'm delighted that you are just as open to other people's ideas as we've tried to be towards yours.

499faktorovich
Dic 17, 2021, 8:57 pm

>498 spiphany: As we approach 500 comments in this thread, it is dumbfounding to me that only Petroglyph has asked for a review copy of my series, and he only quoted 1 paragraph out of it, and did not even read that one paragraph. Everybody else is entirely content to keep discussing something without reading the series itself, even when it is offered for free. If my series is not "worth their time", why is everybody here investing time in commenting. I cannot imagine going into a thread in LibraryThing about Researcher X's Y-Series, and posting hundreds of comments there about how awful his series is without even attempting to ask for a review copy. The cover-summary of the series I previously posted describes what it's about. Such cover-summaries is pretty much all a reviewer has when deciding on reviewing a book or not.

500Petroglyph
Dic 17, 2021, 9:45 pm

>499 faktorovich:
"it is dumbfounding to me that only Petroglyph has asked for a review copy of my series"

and

"I cannot imagine going into a thread (...) and posting hundreds of comments there about how awful his series is without even attempting to ask for a review copy"

But then at the same time:

"pretty much all a reviewer has when deciding on reviewing a book or not"

Even this complaint is entirely nonsensical. But I guess you've got to say something in order to keep eyes on your self-published ramblings.

501SandraArdnas
Dic 17, 2021, 9:47 pm

>499 faktorovich: It's very simple. Your case is void and people don't care. How fucking entitled can you be to expect that anyone is obliged to read your frigging books. Do you have any idea how many truly interesting and insightful books there are in the world? By people who actually know what they are talking about I might add

502amanda4242
Dic 17, 2021, 10:01 pm

>499 faktorovich: it is dumbfounding to me that only Petroglyph has asked for a review copy of my series

And I'm dumbfounded that you don't understand why nobody wants to read a work that has been so thoroughly debunked and the author shown to have little to no knowledge of the subject in which she claims expertise.

503faktorovich
Dic 17, 2021, 11:17 pm

"How well he's read, to reason against reading!" --Love's Labors Lost

504Petroglyph
Dic 17, 2021, 11:55 pm

>503 faktorovich:
Hey, I recognize that title. That's by Shakespeare, isn't it?

505faktorovich
Dic 18, 2021, 12:09 am

>504 Petroglyph: No, it's by William Percy.

506Petroglyph
Modificato: Dic 18, 2021, 12:38 am

Aaawww, I was expecting at least 500 words. Or at the very least a paragraphless copy/paste from one of your self-published books. Oh, whither this two-hundred-and-fifty-score daily word count, much-touted? Has the bounce gone from your bungee?

507bnielsen
Dic 18, 2021, 7:28 am

>499 faktorovich: Just a service message: A quick count of the previous 506 comments reveals that nobody has "hundreds" of comments.


 ...
 11 andyl
 11 bnielsen
 11 paradoxosalpha
 12 Keeline
 15 melannen
 16 lilithcat
 16 MarthaJeanne
 16 spiphany
 24 susanbooks
 84 Petroglyph
 164 faktorovich

508Petroglyph
Dic 18, 2021, 9:17 am

Just reporting on some silliness.

Faktorovich has referenced the number of pages* of her book The Re-Attribution of the British Renaissance corpus quite a few times in this thread (search for "698" and you'll get 9 results (including this one. 2 of those are mine)

I can report that the final page of her index is page 696. This is followed by one (1) page of ads for the other volume in this series, one (1) blank page, and one (1) back cover.

So even here she is overselling her efforts. It's bluster all the way down.

"There, altogether, hung Grendel's grasp". Beowulf, by Sîn-lēqi-unninni, or else by Bede.

(* a tell-tale sign of her unscholarly approach to impressing people unfamiliar with academic publishing)

509andyl
Dic 18, 2021, 10:06 am

>507 bnielsen:

Ahh but faktorovich has run all our posts through her computational processes and has proved that there is a single ghostwriter behind us all.

510thorold
Dic 18, 2021, 11:27 am

>508 Petroglyph: or else by Bede.

Perhaps this could be the golden opportunity to test the claim made by Sellars and Yeatman about Bede’s authorship of The Rosary?

511faktorovich
Dic 18, 2021, 1:06 pm

>509 andyl: You guys have accused me of concluding "there is a single ghostwriter behind us all" at least 5 times now. Are you attempting to confess something? I have not previously replied to this point as it seemed irrelevant. But when you keep repeating any point, it must be the central point that is troubling you.

512susanbooks
Dic 18, 2021, 1:11 pm

>511 faktorovich: sometimes a cigar is just a cigar.

Are you familiar with the verb "to joke"?

513Petroglyph
Dic 18, 2021, 1:27 pm

>512 susanbooks:
Oh crap, I posted that under the wrong pseudonym. How do I undo?

514susanbooks
Dic 18, 2021, 1:48 pm

>513 Petroglyph: I wouldn't worry. Important, irrefutable scholarship proves that William Percy has authored everything that has been or can be written. We all know it's you, William.

515faktorovich
Dic 18, 2021, 3:52 pm

>514 susanbooks: Anything that causes laughter can be a "joke", the noun form of this word; the act of causing laughter is the verb. If a statement is stated without causing laughter; then, it has not actually been the act of "joking". The repeat questions regarding if everybody else in this discussion (except for me) are various bylines for a single ghostwriter have not caused me to laugh, so they were not jokes from my perspective. Your repeating suggestions that my computational-linguistic findings are as nonsensical as discovering that a single author "authored everything that has been or can be written" have also not made me laugh, and thus they are not jokes to me. My findings are entirely rational. My "Re-Attribution" study of the Renaissance concluded there are 6 ghostwriters that were operating across this century. My study of the 18th century found many more than 6 different authorial signatures working in Britain in a relatively small corpus (most only wrote under their own bylines without collaborating or ghostwriting under pseudonyms). Even my mini-experiment that we ran as part of this discussion on 16 texts found 4 distinct signatures between the 5 tested bylines. So, if your goal is to make jokes; you should work on studying the art of humor-writing.

516lilithcat
Dic 18, 2021, 6:40 pm

Vota: I think the posts suggesting that everybody in this discussion (other than faktorovich) are various bylines for a single ghostwriter are amusing.

Corrispondenza attuale: Sì 52, No 3, Incerto 3

517Crypto-Willobie
Dic 18, 2021, 8:26 pm

Are the ballads and novels of
Thomas Deloney written by one of the ghostwriters?

518lilithcat
Dic 18, 2021, 8:31 pm

Thomas Deloney

519Bushwhacked
Dic 18, 2021, 8:35 pm

After all this the only mystery I am vaguely left with is the question as to whether Petroglyph is a man or a woman. Perhaps I need to read the thread again more closely. Alas I'm just a middle aged white male potential ghostwriter, leading an otherwise dull suburban life. Oh... and I have run out of popcorn.

520faktorovich
Dic 18, 2021, 8:43 pm

>517 Crypto-Willobie: I tested "Blind-Beggar of Bednal-green" that is currently misattributed to "Deloney" and it matched Ben Jonson's signature. The broadsheets attributed to him were not tested, but are more likely to match Percy's style, as I found Percy's hand in other similar pieces. All of the Renaissance novels I tested matched Richard Verstegan's style. So "Deloney" was written by at least three different ghostwriters. The texts currently attributed to "Deloney" are likely to have been initially anonymous and would have been re-attributed to "Deloney" across the past 4 centuries by literature scholars intuitively.

521Crypto-Willobie
Dic 18, 2021, 8:55 pm

I hate to bring it all down to common sense when all the slings and arrows of statistics and stylistics are being deployed... but c'mon people! All the literature 1550 to 1650 -- sermons, plays, poetry, novels, histories, broadside ballads, you name it-- EVERYTHING -- was written by ONE of SIX people?????????????????????

I don't need mathematical formulae to tell me this is nonsense.

522Bushwhacked
Dic 18, 2021, 9:28 pm

>521 Crypto-Willobie: I for one can see profit in accepting the good Doctor's theory! Based on the work of Dr Faktorovich I'm working on a theory of my own that the Mystery Renaissance Ghostwriter was actually a Time Travelling Alien from Zeta Reticuli, who also had a hand in the construction of the Pyramids as well as the Nazi Megastructures on the Far Side of the Moon. My ultimate objective is to sell this as a 10 part series to The History Channel. Somebody's gotta get rich out of this.

523Crypto-Willobie
Dic 18, 2021, 9:58 pm

>522 Bushwhacked:
I too can see Faktorovich's work as grist for the satire mill except that
I am aweary of ahearing of it...

524faktorovich
Dic 18, 2021, 11:23 pm

>517 Crypto-Willobie: Try this experiment with me. Go to this page: https://quod.lib.umich.edu/e/eebo?key=author;page=browse;value=de, and this page: https://quod.lib.umich.edu/e/eebo2?key=author;page=browse;value=de. Search for "Deloney". Click to open all of the titles that come up for this author in separate tabs. You are going to find 28 different titles. Search for the original bylines in each of these texts' title-pages that will be the first page that will open. Not the bylines in the "Author" line that are the current assignments, but the original publisher-attributed bylines. For example "Thomas of Reading" is signed with only the initials "T. D.", and these initials repeat in a few of the other texts without the full name appearing as it does here in the title, such as "The royal garland". All of the "T. D." initialed texts appear to have been automatically re-attributed to "Deloney" by previous scholars even though the more common name actually used on texts with these initials is "Dekker, Thomas". Some texts were at least originally without bylines (they are anonymous), like "Strange histories, of kings, princes, dukes earles" (this is a collection of musical poems that were printed by "William Barley", one of William Byrd's collaborators under his music-publishing-patent, so these are likely to be ghostwritten by Byrd); this text was published anonymously in 1602 before gaining the direct "Thomas Delone"-byline in 1612 (just after Byrd re-tired from writing for the Workshop). Then there are anonymous texts without a date on the title-page that have been clearly misattributed such as "The Spanish lady's love", and there are anonymous dated texts like "A proper new sonnet declaring the lamentation". As is typical for most bylines from this century, the only book I found with the full name of this "author" is the 1612 re-printing of the previously anonymous ""Strange histories", and it is probably a satirical nickname for Byrd on account of Byrd becoming "de lone" countryman during his retirement. Don't read "Deloney's" biography, and just look over this full list of texts currently attributed to him, and try to figure out why they are attributed to him. If you want to go further, look for scholarly books/ articles that have re-attributed them to "Deloney" and as you read through the errors in these re-attributions, you should start to believe that my proposal for 6 ghostwriters is far more credible and evidence-based than whatever it was that has happened in the past 4 centuries to lead to the current attributions.

525bnielsen
Dic 19, 2021, 4:51 am

>521 Crypto-Willobie: But they all have the signature Nine! according to this:

https://dilbert.com/strip/2001-10-25

526Bushwhacked
Dic 19, 2021, 5:57 am

Fast forward a thousand odd years into the future they'll be debating whether Macbeth was written long ago in the mists of time by Roman Polanski, Bill the Bard, or Hugh Hefner...

https://www.youtube.com/watch?v=Zp70jXJFX9M

527Bushwhacked
Dic 19, 2021, 6:05 am

>516 lilithcat: Well... based on the figures to date (Y29 N1 U2) there's at least one neurodivergent here and it's not me.

528faktorovich
Dic 19, 2021, 9:35 am

>527 Bushwhacked: "Neurodiversity or ND, refers to variation in the human brain regarding sociability, learning, attention". I explained above specifically why the attributions to "Deloney" by past scholars have been erroneous. In response, instead of focusing on reviewing this evidence, and learning from this new information, you are antisocially tossing an insult back at me.

529lilithcat
Dic 19, 2021, 9:54 am

>528 faktorovich:

"Neurodivergent" is not an insult.

530amanda4242
Dic 19, 2021, 1:55 pm

>527 Bushwhacked: While neurodivergent is not an insult and I do not believe your post violates the TOS, I find myself uncomfortable with your labeling people who do not find certain posts humorous as neurodivergent. Rather than crack wise about neurodiversity, how about we stick to the subject of the thread: the debunking and mocking of pseudo-scientific conspiracy theories.

531faktorovich
Dic 19, 2021, 3:15 pm

>530 amanda4242: Dear Amanda: Yes, indeed. I wrote a detailed paragraph explaining how past scholarly re-attributions of 28 texts to "Deloney" with only 1 of them having a variant of this name in its byline is a type of "pseudo-scientific conspiracy theory". I also explained how my systematic computational-linguistic method and research into documentary sources corrects these previous researchers' errors. It would be logical if one of you read this explanation and replied to it instead of sliding off-topic into nonsense again.

532SandraArdnas
Dic 19, 2021, 4:11 pm

>531 faktorovich: Ah, so you think one should respond to arguments presented. Strange, you never do

533anglemark
Dic 19, 2021, 4:58 pm

I am the other author who uses the byline anglemark (as you can see from our profile page, there are two humans sharing this account; I am the one who almost never posts to Talk). I happen to be a (corpus) linguist, and I'm a bit troubled not only by the fact that LT chooses to highlight this pseudolinguistic research experiment, but also by the fact that it is presented as a "newly-invented method", without a single word about the body of existing research in corpus stylistics, historical corpus linguistics, and related fields. Faktorovich's method is admittedly different from those used in other studies in that it is not scholarly nor scientifically valid... but the weaknesses of her methodology have already been discussed at length, and I don't think I would succeed where the excellent and in-depth explanations by petroglyph and others have failed. (FTR, faktorovich, in >524 faktorovich: you don't explain why you believe that texts previously attributed to Deloney were not written by him – you only say that his texts were first published anonymously or with his initials, which doesn't prove anything, and neither does the fact that more than one author had the initials T.D.)

Corpus stylistics is a very valid academic field however, and for anyone who is genuinely interested in exploring how we can use electronic corpora and computer tools to investigate style, here is some suggested reading:
* Geoffrey Leech & Mick Short: Style in Fiction
* Elena Semino & Mick Short: Corpus Stylistics
* Michaela Mahlberg: "Corpus stylistics", chapter in The Routledge Handbook of Stylistics
* Patrick Studer: Historical Corpus Stylistics

Of these, Leech and Short is perhaps the most accessible one; it is often used as a textbook at the undergrad level.

Since Shakespeare has featured prominently in the thread, here is an article showing one example of how his style can be (constructively) analysed, the kind of features that can be selected for analysis, and what it can show us about how Shakespeare selected specific stylistic elements:
Murphy, Sean. “I Will Proclaim Myself What I Am: Corpus Stylistics and the Language of Shakespeare’s Soliloquies.” Language and Literature, vol. 24, no. 4, Nov. 2015, pp. 338–354, doi:10.1177/0963947015598183.

534Bushwhacked
Modificato: Dic 19, 2021, 5:22 pm

>528 faktorovich: Dr Fakotorovich, I am delighted to make your direct acquaintance at last. Your work as I see it offers a unique value proposition to extrapolate into the popular market for our mutual benefit - please refer to my post >522 Bushwhacked:. I propose the establishment of either a partnership or corporation to fund the outlined television series. As I am not domiciled in the US I suggest that we may need to hire a lawyer with sufficient knowledge of the rules, and it appears >530 amanda4242: may be able to provide the expertise we will require. I sincerely thank >516 lilithcat: for bringing us all together.

535prosfilaes
Modificato: Dic 19, 2021, 5:34 pm

>437 faktorovich: They are designed to be inaccessible to users (including blocks where you need permission from the creator to use it), so that users instead pay programmers to use them for them.

Hah. As a programmer, we don't usually make things inaccessible to users on purpose. In fact, programmers actively try to make things more accessible to users, just to find that when you've given users all the flexibility they could want, you've effectively created a new programming language that only programmers could use. Part of this can be because programmers don't think like nonprogrammers do, but at a certain point, you can't make things simpler without giving up functionality.

536faktorovich
Dic 19, 2021, 5:41 pm

>533 anglemark: Like others who have introduced themselves as "(corpus) linguists" in this thread, you do not reveal your full name in your profile. You cannot claim the credentials of a professional "(corpus) linguist" without at least giving your name, so that your previous publications in this field can be verified. Without this information, you can be either puffing your own articles when you recommend "Leech... Short" and the other pieces you cite, or you can have no actual credentials in this field, and merely saying that you are qualified to give your opinions the appearance of superiority.

I have "newly-invented" my computational-linguistic author-attribution method because it is an approach that has not been described before in this field. It is accurate, as I have been able to defend it across this discussion and across my Series, neither of which you have read. I have discussed the "body of existing research in corpus stylistics" across this thread in great detail, and have pointed out the countless errors in the standard approaches that have been previously used in this field.

You appear not to have understood my explanation regarding the "Deloney" byline, so I will re-state it. 1. Out of 28 texts currently erroneously claimed to be by "Deloney" in EEBO, only 1 had this name anywhere inside of these Renaissance books, and in it is misspelled as "Delone". 2. Around half of the rest have the "T. D." initials (this means they were anonymous, since these initials could have equally applied to several possible bylines including "Dekker's"), and another half are anonymous and do not have any byline on them. 3. Across the past 400 years, despite there only being evidence to attribute 1 book to "Delone", scholars have erroneously used nothing but their imaginative intuition to assume 27 of these other anonymous texts were also written by "Deloney" and added his byline to them in fictitious biographies of him and in catalogs and lists such as EEBO. Now your question, "why you believe that texts previously attributed to Deloney were not written by him". Because across all of "Delone/y's" life and for decades after it, neither he nor any publisher attributed any more than 1 book to his credit. There is no documentary proof to the contrary; most of these credits are for poetry/ ballads, so there wouldn't even be a performance credit. Catalogers/ scholars have chosen in the past to let their entirely unsupported desires stamp bylines onto 27 anonymous texts. Are you arguing that once any scholar, cataloger, or just anybody with a pen scribbles a byline onto a book; then, an overwhelming volume of proof is needed to remove this false assertion; but not even a sprinkle of proof is needed to make the initial false assertion? I have already explained that I ran one of these texts through my 27-tests and it matched Jonson's signature, and I explained that the printer of the one "Delone" book, "William Barley", was one of Byrd's pseudonyms, as "Barley" was granted the music-publishing monopoly after it passed from Byrd to "Morley" and finally to "Barley". I also explained that all 5 of the novels I tested proved to have been ghostwritten by Verstegan, so it is extremely likely that Verstegan also ghostwrote any novels attributed to "Deloney". You can check the data on my GitHub to confirm this is the case, as the novels are listed there with their attributions. And I explain the "Barley" pseudonym in the Byrd section of Volumes 1-2 of the series.

537faktorovich
Dic 19, 2021, 5:46 pm

>535 prosfilaes: I am not arguing about what programmers "usually" do, but instead about the inaccessibility and un-usability of the computational-linguistics author-attribution tools that I have read about and attempted using across my research. Anybody can create a language (programming or non-programming); the value and the benefits of a usable language is the dictionary and the grammar book that goes with it that explains how to use it. Without such guides a programming language, or a program is as nonsensical as a made-up-language without such tools.

538amanda4242
Modificato: Dic 19, 2021, 7:08 pm

>536 faktorovich: you do not reveal your full name in your profile

Their profile clearly gives their names as Johan and Néa, and the LT author box shows their last name is Anglemark.

ETA: An extra ~30 seconds of sleuthing leads me to believe >533 anglemark: was authored by Linnéa Anglemark, who is employed at Uppsala University. You can find a list of her publications on her University profile. https://katalog.uu.se/profile/?id=N2-1645

539prosfilaes
Dic 19, 2021, 6:53 pm

>536 faktorovich: It is accurate, as I have been able to defend it across this discussion and across my Series, neither of which you have read. I have discussed the "body of existing research in corpus stylistics" across this thread in great detail, and have pointed out the countless errors in the standard approaches that have been previously used in this field.

There doesn't seem any one who is convinced by your arguments yet, so "I have been able to defend it" seems to be puffery.

I had never heard of Thomas Deloney before today, so I have no opinion on the correct attribution of the works attributed to him. This discussion has prejudiced me against accepting your arguments on the matter, though.

I saw mentioned on the Internet a man who was claiming arithmetic is inconsistent. I looked it up, thinking it would be the standard crank thing, but people were treating it seriously. Because the claimant was a mathematician, because instead of self-publishing or self-promoting he was discussing his ideas (online) with other people competent in mathematics before attempting to publish them, and when one of them pointed out where he had made a mistake, he listened and accepted that he was wrong. If it had been a fixable error, he presumably would have done like Andrew Wiles (prover of Fermat's Last Theorem) and sat down and figured out what he needed to do to work around the problem, instead of claiming that it wasn't a problem. That's how you make extraordinary claims.

A claim that six ghostwriters wrote the entire British Renaissance is something I'd be skeptical if a reputable scholar in the field claimed it; I might wait until it's something actually showing up in textbooks as truth before accepting it. The fact that instead of using standard methods (which you dismiss as having countless errors), you're using your "newly-invented ... computational-linguistic author-attribution method" isn't a win. The fact that you haven't given a single example where you've run it over existing data and came up with the expected results seriously calls it into question. The first thing you put on a scale is a set of objects with known weights. Is it accurate? Without such a test, no one should believe it.

540Petroglyph
Dic 19, 2021, 7:30 pm

>536 faktorovich:
"you do not reveal your full name in your profile"

This is just more evidence of you reading sloppily and then jumping to conclusions in hastily-written diatribes.

Also, since you've brought the "people post under pseudonyms online" thing up a few times upthread -- as something that reflects negatively on pseudonymous posters -- , I feel compelled to ask: is this your first week on the internet? Really, faktorovich, your lack of understanding of the reasons why people might not want to post under their real names has strongly negative implications for your own (pseudo-scientific) research.

"I have been able to defend it across this discussion and across my Series, neither of which you have read."

Well, it would be fairer to say that you have repeated yourself at length. Saying things that are not true multiple times does not make them more true. You also do not know whether >533 anglemark: has read this discussion. Stating categorically that they haven't is a straight-up lie; the only honest answer is "I don't know".

Let's not beat about the bush here. You have provided enough information in the many thousands of words that is your interview, your cover summary, the quotes pulled from your self-published books, the engagement in this thread, the examples of your methodology in this thread, and the "clarifications" you think you've offered in response to questions. The criticisms in this thread are justified, and so is the decision to refuse to read your pseudoscience. Many people have decided it is, in fact, not worth their time (see >459 Petroglyph:), and that is a decision that is both fair and justified -- not something you can hold against those people and be honest at the same time.

Your attitude of "I'm dumbfounded that people won't read my books before they criticize me" is not consistent with the facts -- it's a cheap rhetorical trick that allows you to criticize people for not devoting an absurd amount of time (14 books!) to your pseudoscience, while also avoiding the adult responsibility of recognizing when to cut your losses; it permits you to blame your audience (for, supposedly, not engaging enough with you), and to pretend that, wherever the error/responsibility lies, it is certainly not with you.

Demanding that your audience reads not only the tens of thousands of words in this thread, but also a 696-page book, or even a 14-book series before you feel they've done you justice is an unfair and irresponsible expectation. It is, in fact, a goalpost you've placed so far away that you'll never really have to admit that you're wrong, a hoop you can comfortably hold up, knowing no-one will jump through it.

This is not how proper scholars argue. This is a red flag indicating that some argumentative shenanigans are afoot.

Treating an online moniker as a justification for unfairly casting doubt on someone's motivations is another such flag.

I sometimes read other kinds of pseudoscience -- young-earth creationist poppycock, linguistic conspiracy theories (e.g. "Latin is derived from Hebrew"), and "Troy was in England"* type nonsense. The red flags present in Faktorovich's style of arguing are immediately recognizable to anyone familiar with any other form of history-rewriting pseudoscience.

*The river Cam, which flows through Cambridge, is, supposedly the river Scamander.

541faktorovich
Dic 19, 2021, 7:43 pm

>539 prosfilaes: At this point, even if I said the sky is blue, all of you would disagree just because I said it. I have fully defended my method. Nobody here is a judge, so no single opinion can rule against my method. And science is not decided by a democratic vote of the majority. A scientific proof is frequently perfectly correct, and yet it can be doubted or refuted by the masses, or by academic gate-keepers. One example of why your side of this debate is wrong is that you are insisting you "have no opinion" on the "Deloney" attribution in particular, and thus you are saying that you did not follow my challenge for you to check EEBO for yourself to establish that only 1 out of 28 texts assigned to this "author" actually carries this byline; you cannot refuse to look at the presented evidence, and then insist no evidence was presented. As I explained before, I have heard all of the objections raised in this discussion before from peer reviewers of my research; I have explained to them and to you how these objections are incorrect, and there are no actual mistakes in any steps or the entirety of my method. The most-repeated objections to my method in this thread have been that it involves data-entry and that I am converting multiple measuring units into a comparable binary system; these are not mistakes - they are the most rational steps to accurately solve attribution mysteries consistently. This is the first time anybody here has proposed sitting down and working "around the problem". I am not at all opposed to sitting or working; if you see a problem, explain what it is in this thread or email me at director@anaphoraliterary.com.

"Reputable scholar"? You mean you only trust things famous "scholars" say? I have published a couple of scholarly articles on my attribution method, and have mentioned related topics in the two scholarly books I published with McFarland. If you do not believe anything until you read it in a textbook, why are you engaging in this discussion before that point? Textbooks are part of the problem, as they repeat fictitious attributions made by catalogers and others who do not engage with scholarly research and just assume "Deloney" wrote 28 books, and not only the 1 with this name on it. A textbook is a simplified summary of stuff researchers have previously said that is designed for high school/ college students, so any scholarly book should be more trustworthy as a result. I have given countless examples of bylines matching other texts with that same byline across the centuries. You are proposing that, to be correct, an attribution method has to find 100% expected answers in a given large corpus? This is the reason the previous methods have failed to find the major re-attribution shifts I have found; because folks like you call byline-contradicting findings or re-attributions of past attributions "errors" in the proposed method. The presence of ghostwriters, pseudonyms, and collaboration in corpuses of British/English literature from the Renaissance to the presence is explained by my findings regarding the founding of English literature with a Ghostwriting Workshop. The Workshop was assisted by Elizabeth I to set up a publishing system in England that required asking the monarch to grant a monopoly to publish in a given field (music, textbooks etc.); the Workshop also was forced by the anti-vagabond laws to keep their independent authorship contracting secret, thus creating the need for anonymity or pseudonyms. There are various other pressures I explain in the series that forced the publishing industry to make a significant portion of its profits from sponsored or paid-for publications (such as being paid to write government/ religious propaganda, or being paid by an aristocrat/merchant to put their byline on a poetry book to make themselves more appealing to women, or on a rhetorical book to become a more believable politician, etc.). I could pick a set of texts that would not include any unexpected results; but the point of all of my experiments has been correcting past misattributions, and not testing texts that I already know the authorship-attributions for. When Petroglyph could have chosen any set of texts for me to test to prove my method, he deliberately chose the "Brontes" texts that I had already previously found to include the 2 vs 3 linguistic-signatures contradiction; thus, Petroglyph was also more interested in checking for misattributions, vs. designing an experiment that could have proven by re-affirmation its ability to spot a byline-accurate corpus.

542CurrerBell
Dic 19, 2021, 8:04 pm

Really, I do want everyone to know that I am the author of Jane Eyre as well as Shirley, Villette, and The Professor. My dear youngest sister is the author of both Agnes Grey and The Tenant of Wildfell Hall. Emily wrote Wuthering Heights (and I beg you not to pay any heed to that dastardly canard that that cad Branwell had anything to do with Wuthering Heights or any of our writings).

The only reason Anne alone accompanied me on my first visit to my publisher, George Smith, is that Emily is terribly shy in public and was furthermore preoccupied with cleaning the bedlinens that Keeper kept soiling.

The only "ghosting" involved is that I cleaned up some of the grammar and punctuation in Emily's poetry prior to publication.

I very much appreciate the channeling Mike Ehling has done for me over the past fourteen years here on LibraryThing. I had hoped to secure the channeling of Shirley MacLaine, but she was too overburdened by her channeling of Ramtha (who assures me that he has never ghostwritten anything).

543faktorovich
Dic 19, 2021, 8:36 pm

>542 CurrerBell: You have proven that ghostwriting exists. Thus, since your group's main objection to my findings has been that I see too many ghostwriters that you have been suggesting must be figments of my imagination; you must concede that this objection is mute as you yourself have proven the opposite.

5442wonderY
Dic 19, 2021, 9:49 pm

>543 faktorovich: Moot?

545faktorovich
Dic 19, 2021, 11:11 pm

>544 2wonderY: Yes, you have found the problem. I have committed this one spelling error across this thread with nearly 200 of my comments... It is moot.

546Petroglyph
Modificato: Dic 19, 2021, 11:56 pm

>541 faktorovich:
"thus, Petroglyph was also more interested in checking for misattributions, vs. designing an experiment that could have proven by re-affirmation its ability to spot a byline-accurate corpus"

Again, faktorovich, I am forced to tell you not to put words into my mouth. You assume too much and then proceed to type your loggorhea in which you attribute motivations to me that are patently not true, and that you have no way of knowing.

"I could pick a set of texts that would not include any unexpected results; but the point of all of my experiments has been correcting past misattributions, and not testing texts that I already know the authorship-attributions for."

Again, you've either missed the point that >539 prosfilaes: was making -- which is the same point that many of us have been making (re: Iain (M) Banks, James Joyce, your own writings, etc.). It's the strangest thing. You keep "misunderstanding" and "avoiding" this point.

"At this point, even if I said the sky is blue, all of you would disagree just because I said it."

Not so. If you made a testable claim that turned out to be correct, we'd agree. As long as you call things by their wrong byline, we will disagree.

I'm reminded of this scene (from Shakespeare's The Taming of the Shrew), in which an odd character makes absurdly wrong claims that are obviously wrong to everyone; they also try to bully people into agreeing with them by being very obnoxious:

PETRUCHIO. Come on, a God's name; once more toward our father's.
Good Lord, how bright and goodly shines the moon!
KATHERINA. The moon? The sun! It is not moonlight now.
PETRUCHIO. I say it is the moon that shines so bright.
KATHERINA. I know it is the sun that shines so bright.
PETRUCHIO. Now by my mother's son, and that's myself,
It shall be moon, or star, or what I list,
Or ere I journey to your father's house.
Go on and fetch our horses back again.
Evermore cross'd and cross'd; nothing but cross'd!
HORTENSIO. Say as he says, or we shall never go.
KATHERINA. Forward, I pray, since we have come so far,
And be it moon, or sun, or what you please;
And if you please to call it a rush-candle,
Henceforth I vow it shall be so for me.
PETRUCHIO. I say it is the moon.
KATHERINA. I know it is the moon.
PETRUCHIO. Nay, then you lie; it is the blessed sun.
KATHERINA. Then, God be bless'd, it is the blessed sun;
But sun it is not, when you say it is not;
And the moon changes even as your mind.
What you will have it nam'd, even that it is,
And so it shall be so for Katherine.

547Petroglyph
Modificato: Dic 20, 2021, 12:00 am

More reports from faktorovich's silly book.

Queen Elizabeth the First did not write her own letters. It was Gabriel Harvey!*
The Faerie Queene is not from the hand of Edmund Spenser, either. That was Harvey, too!*
Ben Jonson authored The Taming of the Shrew*.

*This is what Faktorovich actually believes. I am not making this up.

548Keeline
Dic 20, 2021, 12:13 am

>533 anglemark: The first of these can be read online from the Internet Archive:

https://archive.org/details/styleinfiction00geof

James

549Bushwhacked
Dic 20, 2021, 12:21 am

>542 CurrerBell: Well, I appreciated your sense of humour, even if it appears to have evaded the comprehension of our distinguished contrarian interlocuter!

550faktorovich
Dic 20, 2021, 12:26 am

>547 Petroglyph: Thanks for correcting the attribution from "Shakespeare" to Jonson on my behalf for the "Shrew". Did you notice the section in Volumes 1-2 on the "Spenserian stanza". If you search for the term "Spenserian", you will discover that all texts that feature this unique rhyme scheme were ghostwritten by Harvey. And if you read the chapters on Harvey and Elizabeth I you would discover that Harvey wrote about meeting Elizabeth while performing a speech for her as a student; Harvey ghostwriting "Elizabeth's" letters is proven with documented facts. I have explained that the 284 texts I tested include all of the main canonical texts from this century, so obviously they include all texts currently assigned to "Elizabeth", "James I", etc. And you are making spelling mistakes in your spelling of "Fairy Queen" - using the old-spelling version of this title is as erroneous as using "Iliás" instead of "Iliad", or "Leir" for "Lear" (Percy mixed the latter two up).

551Bushwhacked
Modificato: Dic 20, 2021, 12:52 am

>547 Petroglyph: ... curious, I recently watched a distinguished historian's multi part documentary on Elizabeth I, during which he presented what I thought to be credible evidence of her literary skill and articulate written proficiency from a very young age. I guess I've been misled by the establishment intelligentsia once again :(

Clearly it's us middle age white male ghostwriters who deserve all the accolades, whatever century we're in!

552prosfilaes
Modificato: Dic 20, 2021, 12:57 am

>537 faktorovich: >437 faktorovich: assumes malfeasance. In reality, writing documentation is hard and not always in the skill set of programmers. For R, however, there's Introductory Statistics with R or The R Book or R in a Nutshell or The Art of R Programming, so it's hard to claim R lacks guides, and, not speaking for any particular guide, quite unlikely they're all bad. Even if they are, many, many people seem to have managed to learn R anyway.

>541 faktorovich: A textbook is a simplified summary of stuff researchers have previously said that is designed for high school/ college students, so any scholarly book should be more trustworthy as a result.

Scholarly books are much more frequent in pushing random theories that only the author believes in than textbooks are. A textbook may be out of date, but it will give consensus answers.

You are proposing that, to be correct, an attribution method has to find 100% expected answers in a given large corpus?

The only way we can know how accurate an algorithm is is if we feed it problems that we know the answers to and compare the results to the known answers. That's called testing.

Most algorithms on natural language problems aren't going to be 100%, and algorithms often turn up mistakes in the tests. But there's no way to know how well an algorithm works if you don't feed it data you already know the answers to.

We don't know your method works, and providing a bunch of claims that disagree with what is otherwise believed does nothing to prove that.

The presence of ghostwriters, pseudonyms, and collaboration in corpuses of British/English literature from the Renaissance to the presence is explained by my findings regarding the founding of English literature with a Ghostwriting Workshop.

What? Such things have presence in a vast array of literatures over the history of literature on the planet. As someone else pointed out, writing under the name of a more famous author is very common in ancient works. You really think that you need to provide an explanation rooted in the founding of English literature?

553Bushwhacked
Dic 20, 2021, 1:21 am

>550 faktorovich: ...as for the Faerie Queene I'm going with Petroglyph... if I recollect it was also the spelling when I was at university (admittedly a long time ago... in fact a previous century, good gosh!). I also note that great repository of knowledge for the Common Man, Wikipedia, appears to be in agreement. https://en.wikipedia.org/wiki/The_Faerie_Queene

But whilst we're here, my good Doctor, I don't suppose you could throw us an analysis of who actually wrote John's Donne's erotic verse? I'm hoping it was Good Queen Bess, that would stir the history books up a bit!

554CurrerBell
Modificato: Dic 20, 2021, 2:01 am

>545 faktorovich: Congratulations on your current adherence to orthographical standards, with only (I'll take your word on it) a single lapse in this thread. That's a better record than in your novel The Romances of George Sand, as I pointed out in my review of the same some seven years past.

Incidentally, I would not want it thought that I object in general to non-native speakers who write in the English language. The Secret Agent has become one of my favorite novels since I read it just after its first publication some hundred and fourteen years ago (a gift from my erstwhile publisher Mr Smith), and its author in my opinion is far superior to the positively horrid Miss Austen. In fact, my dear Arthur and I had that distinguished author to tea just two weeks after his arrival in his new domicile on 3 August 1924.

But you, Doctor Factorovich, are no Joseph Conrad.

555anglemark
Dic 20, 2021, 3:00 am

(Linnéa speaking, still)
>536 faktorovich: You make a number of incorrect assumptions about what I have and have not done. I have identified myself unambiguously, I have read this thread (I spent several hours on it the day before yesterday), and I have been to EEBO as well as to your github repository to check at least some of your claims.

That you suspect that I might be Geoffrey Leech or Mick Short is insanely flattering (even though it would be scary if Leech were to return from the great beyond to post here), but it is also a little concerning that you have been doing work in stylistics and still don't appear to recognise those names. Which previous research do you refer to in your books and use to support your method?

Your approach is not newly invented by yourself; it is a form of multifactor analysis applied to stylistics, and I know for a fact that you have read Register, Genre, and Style, so how can you claim to have invented that approach? I mentioned some other books on the topic above. Your method, more narrowly, is indeed your own.

A textbook is written for an audience of students, but that does not invalidate the information it provides. A textbook writtten by leading scholars in the field, to be used when teaching undergraduate courses at a university, is almost the best introduction anyone could have to that field. (>548 Keeline: Thank you! That is an old edition, though, and it has been pretty substantially updated – the latest edition is from 2007. Still, for the purpose of getting a general understanding of the topic it's still good.)

556andyl
Dic 20, 2021, 4:38 am

>536 faktorovich:

It took me all of 5 seconds with google to find the full name and university of the right half of anglemark. Surely your research skills can't be that bad?

557faktorovich
Dic 20, 2021, 1:18 pm

>551 Bushwhacked: A few excerpts from Volumes 1-2 explain the "Elizabeth" question:

To win this patent, Byrd enhanced his social standing in part by serving as a music tutor during his early years at Court. In Byrd’s 1577 petition for financial assistance from Elizabeth after his first patented music publication from 1575 failed, Byrd argues that he suffered hardships because his tutoring income was truncated due to the re-direction of his labors on publishing and writing under the music patent. Elizabeth agreed and granted lands to Byrd and Tallis to sponsor their continued publishing. Queen Elizabeth I’s interest in playing virginals with instruments has also been documented in an MS of music that has since been misattributed as Queen Elizabeth’s Virginal Book, before it was again re-attributed as the work of “Tallis, Bird, Bull” and others (George Hogarth, "Musical History, Biography and Criticism, Volume I" (London: John W. Parker, 1838), 57-8, 60-1.). Even without testing, it is obvious that it was written predominantly by Byrd alone...

In contrast with the notion that the dates on the Queen’s records have been firmly authenticated, there have been several studies questioning this dating and the degree of their pre-publication editing. The first published speech attributed on the title-page to Elizabeth I was released in 1601. This date is based on the entries in the Women Writers Online (WWO) database, which is perhaps the fullest collection of rare manuscripts from this early period. The speech released in 1601 is titled Her Majesty’s Most Princely Answer, delivered by herself at the Court at White-hall, on the last day of November 1601: When the Speaker of the Lower House of Parliament (assisted with the greatest part of the Knights, and Burgesses) had presented their humble thanks for her free and gracious favor, in preventing and reforming of sundry grievances, by abuse of many Grants, commonly called Monopolies. The byline is revealing: “The same being taken verbatim in writing by A. B. as near as he could possibly set it down.” Some of the Workshop’s aliases satirically hint at the wrong author as is the case with Verstegan’s employment of the “W. Har” byline to hint at Harvey’s hand in an abbreviated version of his last name. Most of their initialed bylines are intended to be potentially later re-attributed to a variety of potential ghostwriting-contractors. But these “A. B.” initials are most likely to be a satirically selected set of the first two letters of the alphabet. And the note regarding setting “it down… as near as…” possible to the actual speech is not typical for such publications, so it is intended to be interpreted as a confession of an overzealous ghostwriter. There are no other publications in WWO attributed as “by” Elizabeth I until after the closure of the theaters in 1642. They were published in the last few years of Percy’s life, when he was the only surviving Workshop ghostwriter left alive until 1648. First came Queen Elizabeth’s Speech to her Last Parliament, which appeared without a date on the title-page, but is believed to have been issued in 1642. It was followed by A Most Excellent and Remarkable Speech, which is described as “Printed for Humphrey Richardson” on January 28, 1643. Meanwhile, one of the books that is not fully under “Elizabeth’s” name in WWO is the 1615 “William Camden’s” Latin version of Annales, the True and Royall History of... Elizabeth Queen of England, which includes the speech claimed to have been given by Elizabeth I on February 10, 1559 to Parliament in response to a petition urging her to marry; it was re-issued in an English translation in a 1625 London quarto. The British Library prefaces this document with the warning: “Camden adapted the speech for print and, although the essence is the same, there are numerous verbal differences between his text and earlier manuscript records of it (such as Lansdowne MS 94/14), so it is not a reliable record of the exact words spoken in Parliament.” Similar attribution and dating complaints are detailed in Kristen Abbott Bennett’s article on her attempts to trace the history of one of Elizabeth’s speeches, known as the “Golden Speech”, which is also the only speech published within Elizabeth’s lifetime in 1601 “A. B.’s” edition. Bennett learned that there are “mismatched titles and contents” for the versions of this speech in the WWO database from 1628, 1642, 1679 and 1693. Bennett concludes that the problems partially began for this speech when the 1628 printer confused the “Golden Speech” with Elizabeth’s “Last Speech to Parliament”, and later printers repeated this mistake. Given the ease with which later printers repeated this misnaming, it is a certainty that printers also believed the dates claimed on the handwritten manuscripts of these speeches and letters without authenticating these through a forensic analysis. The first solid date in this timeline is the 1601 publication, and it is late enough for all of the Workshop’s members to have contributed to these far from “near… verbatim” transcriptions.

558faktorovich
Dic 20, 2021, 1:40 pm

>553 Bushwhacked: "Donne's" "Songs and Sonnets" (1633) matched Jonson as the primary, and Byrd as a secondary author; Jonson appears to have taken musical poems Byrd wrote before his death and added to them his own poems or re-wrote small segments until the book had his voice as the dominant. Jonson relied on Byrd's earlier pieces because this was his only sonnet/poetry collection across his bylines as he specialized in drama; similarly Percy's only sonnet collection was "Coelia" and the rest of his texts were dramas. Harvey had previously collaborated with Byrd, or re-wrote and added to Byrd's poetry in the "Donne"-registered "Anatomy of the World" (1611); Byrd had decreased his original output by this point as he was now wealthy and in retirement. Neither of these poetry collections had "John Donne's" name on their title-pages; the "Donne" name only appeared on the 3 tested sermons that were ghostwritten mostly by Verstegan, with some help from Harvey (it was unusual for Harvey to ghostwrite sermons, most of which were Verstegan's specialty). The "Donne" verse is not any more "erotic" than a collection like "Coelia" (Volume 3 in my series); the latter has just not been popularized by academia, in other words it has been censored specifically for its eroticism because it is a bit more erotic and especially more overtly homoerotic.

559Bushwhacked
Dic 21, 2021, 1:25 am

... and there I was thinking UNSW English Lit. 101 circa 1989 had provided me all the answers. Bugger.

Well folk's it's Christmas and I'm off, so take care, happy holidays and don't forget to leave Santa a cold beer and a mince pie under the tree!

560susanbooks
Dic 21, 2021, 9:45 am

>558 faktorovich: And again you prove your ignorance. Any nuanced reading of Donne, particularly something like "A Valediction Forbidding Mourning" or "Batter My Heart," demonstrates his own particular brand of eroticism which, in the former, is quite funny. Jonson wasn't even a metaphysical poet.

561faktorovich
Dic 21, 2021, 1:04 pm

"John Donne's"
A VALEDICTION FORBIDDING MOURNING.

AS virtuous men pass mildly away,
And whisper to their souls to go,
Whilst some of their sad friends do say,
"Now his breath goes," and some say, "No."

So let us melt, and make no noise,
No tear-floods, nor sigh-tempests move ;
'Twere profanation of our joys
To tell the laity our love.

Moving of th' earth brings harms and fears ;
Men reckon what it did, and meant ;
But trepidation of the spheres,
Though greater far, is innocent.

Dull sublunary lovers' love
—Whose soul is sense—cannot admit
Of absence, 'cause it doth remove
The thing which elemented it.

But we by a love so much refined,
That ourselves know not what it is,
Inter-assured of the mind,
Care less, eyes, lips and hands to miss.

Our two souls therefore, which are one,
Though I must go, endure not yet
A breach, but an expansion,
Like gold to aery thinness beat.

If they be two, they are two so
As stiff twin compasses are two ;
Thy soul, the fix'd foot, makes no show
To move, but doth, if th' other do.

And though it in the centre sit,
Yet, when the other far doth roam,
It leans, and hearkens after it,
And grows erect, as that comes home.

Such wilt thou be to me, who must,
Like th' other foot, obliquely run ;
Thy firmness makes my circle just,
And makes me end where I begun.

Ben Jonson's "Volpone" (2 sections that echo some of the words in "Donne's" poem)

MOS: O, sir, the wonder,
The blazing star of Italy! a wench
Of the first year! a beauty ripe as harvest!
Whose skin is whiter than a swan all over,
Than silver, snow, or lilies! a soft lip,
Would tempt you to eternity of kissing!
And flesh that melteth in the touch to blood!
Bright as your gold, and lovely as your gold!...

CEL: If you have ears that will be pierc'd—or eyes
That can be open'd—a heart that may be touch'd—
Or any part that yet sounds man about you—
If you have touch of holy saints—or heaven—
Do me the grace to let me 'scape—if not,
Be bountiful and kill me. You do know,
I am a creature, hither ill betray'd,
By one, whose shame I would forget it were:
If you will deign me neither of these graces,
Yet feed your wrath, sir, rather than your lust,
(It is a vice comes nearer manliness,)
And punish that unhappy crime of nature,
Which you miscall my beauty; flay my face,
Or poison it with ointments, for seducing
Your blood to this rebellion. Rub these hands,
With what may cause an eating leprosy,
E'en to my bones and marrow: any thing,
That may disfavour me, save in my honour—
And I will kneel to you, pray for you, pay down
A thousand hourly vows, sir, for your health;
Report, and think you virtuous—

If you read these fragments without checking the bylines and if you had not memorized the "Donne" poem, can you really argue they were written by different writers, and not clearly a single literary (and linguistic) style?

562prosfilaes
Modificato: Dic 21, 2021, 7:04 pm

>561 faktorovich: If you read these fragments without checking the bylines and if you had not memorized the "Donne" poem, can you really argue they were written by different writers, and not clearly a single literary (and linguistic) style?

Yes. I would not argue that they must have been written by two different authors, but I certainly fail to see the single literary style you propose. Compare

AS virtuous men pass mildly away, and whisper to their souls to go, whilst some of their sad friends do say, "Now his breath goes," and some say, "No."

to

O, sir, the wonder, the blazing star of Italy! a wench of the first year! a beauty ripe as harvest! whose skin is whiter than a swan all over, than silver, snow, or lilies!

What similarity do you see?

More numerically, the first is 235 words in 9 sentences, all pretty close to the average of 26 words. The second has is 238 words, but has sentences of 53 and 71 words, for two samples, with those short sentences split by exclamation points (or one sentence with exclamation points intermixed) to start. The first uses zero exclamation points and two em-dashes; the second uses seven exclamation points and eight em-dashes.

This is from a math major; I may miss things because of that, but I have no vested interest in John Donne or Ben Jonson. These two snippets of text seem far from being similar enough that I would expect the same hand wrote them.

563faktorovich
Dic 21, 2021, 10:38 pm

>562 prosfilaes: As I explained, "Donne's" "Sonnets" is co-ghostwritten by Jonson and Byrd, which means that its linguistic style is a relatively weaker match to the texts either of these ghostwriters wrote individually. Despite this, "Donne's" "Sonnets" still show several strong indicators of Jonson's hand. The question-rate is relatively similar at 18 to 13. The overall lexical-density is very similar at 50.66 to 49.15. Most of the word-group categories are close matches. For example, the adjectives rate is 6.14 to 6.9. The adverbs rate is 5.94 to 5.79. The characters-per-word rate is near-identical at 4.05 to 4.03. The rate of I-words is also near identical at 3.3 to 3.2. If you have spotted distinctions in the sentence length and other elements of these specific fragments; you should test each poem/fragment with the full 27-tests method and this might lead to the conclusion that Byrd was the dominant hand behind "Valediction", while Jonson wrote the majority of the other poems in this collection. The co-writers attribution for "Sonnets" means the two could have both written portions of each poem, or each could have written entire separate poems that were combined in the collection. The E-letter-pattern in "Sonnets" (e, t, o, a, s, h) appears in several Jonson texts; it also includes the "I would not" among its top-6 phrases that also appears in Jonson's "Shakespeare"-bylined "Othello".

The opening lines between "Valediction" and these "Volpone" fragments are not uniquely similar. The key terms I searched for between the two include "melt"/"melteth", and "heart". There are also many "erotic" phrases in these "Volpone" fragments, or as many as in "Valediction", including: "a beauty ripe as harvest", "a soft lip, Would tempt you to eternity of kissing!" "flesh that melteth in the touch to blood!" "a heart that may be touch'd", "Yet feed your wrath, sir, rather than your lust", and "Or poison it with ointments, for seducing/ Your blood to this rebellion. Rub these hands,/ With what may cause an eating leprosy".

The differences in "sentence" length you notice are probably due to the different line meters used in these segments; the shorter lines pressure the writer to end their sentences with fewer words to fit this meter. This is why I do not measure words-per-sentence in the Renaissance as one of the 27-tests chosen for this period; because this measure is not reliable at an age when most of the texts were influenced by the meter of the lines. It is also illogical to measure the exclamation rate in any isolated short fragment, as this is a test that only begins to be accurate when an entire lengthy take is considered, and accidents of subject-matter etc. become irrelevant. "Donne's" "Sonnets" and "Volpone" happen to also be very different as a whole in their exclamation rate, and this is a measure where Byrd's style clearly tilted the total average count, as unlike Jonson, Byrd disliked using exclamations; for example, has a similar exclamation rate of 3 in both "Drayton's" "Idea" and "Raleigh's" "Poems", as in this "Donne's" "Sonnets" rate of 5. The em-dashes is a horrid measure in the Renaissance, perhaps because of typesetter preferences, or to changing tastes over time; this is why I did not run a test for dashes or em-dashes. In these fragments, the em-dashes are less frequent when the semi-colon is used in their place in "Volpone", but if the two are substituted with a single mark, their rates would be similar.

Does this explanation clarify the similarity between these fragments?

564SandraArdnas
Dic 22, 2021, 7:23 am

As I explained, "Donne's" "Sonnets" is co-ghostwritten by Jonson and Byrd, which means that its linguistic style is a relatively weaker match to the texts either of these ghostwriters wrote individually. Despite this, "Donne's" "Sonnets" still show several strong indicators of Jonson's hand.

How on earth do you expect to be taken seriously? How? And how did you earn a PhD with that abomination of reasoning faculties?

565susanbooks
Modificato: Dic 22, 2021, 11:20 am

Questo messaggio è stato segnalato da più utenti e non è quindi più visualizzato (mostra)

>564 SandraArdnas: "How on earth do you expect to be taken seriously? How? And how did you earn a PhD with that abomination of reasoning faculties?"

Right? It's astonishingly appalling. Maybe in between the awarding of the PhD & now she suffered a TBI or Eastern Equine Encephalitis? I know two people who were infected with the latter and their brains completely changed. Neither were able to keep working at their jobs (high school math teacher and corporate attorney) despite remaining intelligent. I didn't know them before their infections but supposedly personality changes are also common & they are both quite prickly & difficult to reason with now.

566amanda4242
Dic 22, 2021, 10:46 am

>561 faktorovich: Did you actually read the poems or just run them through your magic number generator?

567susanbooks
Modificato: Dic 22, 2021, 11:14 am

faktorovich: you still haven't answered my question re: where your PhD is from. Inquiring minds would like to know. In fair exchange, mine is in English from Tufts University.

568lilithcat
Dic 22, 2021, 11:28 am

>567 susanbooks:

According to her LinkedIn page: https://www.linkedin.com/in/anna-faktorovich-6063812b , it's from Indiana University of Pennsylvania

569Keeline
Dic 22, 2021, 12:17 pm

>568 lilithcat: Living in California, I was unfamiliar with Indiana University of Pennsylvania (in Indiana, PA) so I looked up the page on their graduate program:

https://www.iup.edu/admissions/graduate/index.html

James

570faktorovich
Dic 22, 2021, 1:14 pm

>566 amanda4242: It is clear that I have read the "Volpone" and "Donne" fragments that I cited because I quoted the relevant sections from "Volpone" to explain the similarity. Your reactions, on the other hand, indicate that you are not reading the evidence you are asking me for, and just repeating the typical strings of insults you probably launch at every researcher to discredit theories that rival your own.

571amanda4242
Dic 22, 2021, 2:06 pm

>570 faktorovich: LOL! All you've shown is that you can copy and paste. Thank you for the laugh!

572SandraArdnas
Dic 22, 2021, 2:07 pm

>570 faktorovich: Unlike you, we don't have pet theories. We look at arguments presented. You consistently employ faulty deductive/inductive logic to further it, while simultaneously blatantly ignoring to address objections presented. It would be laughable if it hasn't become depressing by now.

Isn't GRE a prerequisite for any MA and PhD program in the US? With logic 'if you can't categorically determine several authors, it follows it must be by one (and that is categorical and indisputable apparently)' there is no way whatsoever to achieve even remotely respectable score on either logical or verbal part of it. Seriously, it's basic reasoning and the fact that someone with PhD fails to comprehend it is beyond belief.

573lilithcat
Dic 22, 2021, 2:34 pm

>572 SandraArdnas:
Isn't GRE a prerequisite for any MA and PhD program in the US?

Apparently not.

GRE scores are optional, but please submit them if you feel they help your application.: https://www.iup.edu/english/grad/literature-criticism-phd/how-to-apply.html

574cpg
Dic 22, 2021, 3:10 pm

>573 lilithcat:

It's not just IUP:

"GRE scores are no longer required for application to the PhD program in English." https://ase.tufts.edu/english/graduate/prospectiveStudents.htm

575lilithcat
Dic 22, 2021, 3:34 pm

>574 cpg:

I know a lot of colleges and universities no longer require SAT scores, so it doesn't surprise me that some graduate programs don't require GREs.

576faktorovich
Dic 22, 2021, 4:17 pm

>575 lilithcat: I passed the GRE and the LSAT. The GRE was definitely required by the University of South Carolina, where I received by MA. In fact, I was almost hired to write the LSAT by LSAC, but I opted to instead move into my current tiny house in Quanah, Texas to work on my independent research. I have written tests and course materials for organizations such as Bridgepoint Education (now Zovio), and several smaller schools, in addition to writing several textbooks of my own for the college English classes I have taught over the years. Why don't you address my perfectly logical explanation regarding the similarity between Jonson's "Volpone" and the "Donne" poems instead of attacking my perfect 4.0 GPA record at IUP? I am working right now on translating Jonson's letters, and they include a letter to "Donne", which Jonson signs off on thus, "Your ever true Lover".

577SandraArdnas
Dic 22, 2021, 4:27 pm

>575 lilithcat: Not familiar with SAT, but I thought it tests knowledge of specific areas. GRE OTOH tests skill set, not knowledge or information retrieval and is actually a good indicator of reasoning skills that are necessary for any scholarly/research area

578SandraArdnas
Dic 22, 2021, 4:34 pm

>576 faktorovich: My eyes will fall out of my sockets from rolling at this. Did you employ the logic you're using here in your GRE and your textbooks or is it reserved for your pet theories only? Maybe it isn't that you don't comprehend when you're committing logical fallacies left, right and center. Maybe you just don't care because of vested interest in promoting your project and your books. Either way, it's a travesty

579lilithcat
Dic 22, 2021, 4:59 pm

>576 faktorovich:

Nobody "attacked" your record. SandraArdnas queried whether the GRE was a requirement for all graduate programs, and I gave her an example showing that it was not.

How you figure that was an "attack" on your record is, quite frankly, beyond me.

580Petroglyph
Dic 22, 2021, 5:16 pm

>576 faktorovich:
"Why don't you address my perfectly logical explanation regarding the similarity between Jonson's "Volpone" and the "Donne" poems"

Because your "perfectly logical explanation" consists of a) shitty reasoning; b) the absolute beginner's mistake that is using absolute frequencies; c) taking your massively wrongheaded "methodology" for absolute truth; d) your refusal to consider alternative explanations (e.g. genre similarities); and a bunch of others. All of these things have been pointed out to you before, numerous times. Repeating your garbage claims, and handwaving criticism by assuming malice does not make those garbage claims a "perfectly logical explanation". You're too ignorant of computer science, linguistics, statistics and history to really understand just how unambiguously wrong your writings are.

Your book as well as your comments in this thread are a straightforward case of GIGO: Your method is absolute garbage; your understanding of frequency effects is absolute garbage; the way you've implemented your frequencies is absolute garbage; and, consequentially, your results and conclusions are absolute garbage.

581Petroglyph
Dic 22, 2021, 5:35 pm

>579 lilithcat:
It's an underhanded debating tactic, and really all she's got.

Look back at her response to Crypto-Willobie's comment "what a load of bollocks" in >21 faktorovich: "Are you stating that I have an enormous load of testicles? Is this a negative because I am a woman and you are being ironic?"

Her response to me using the idiom "if it walks/talks/looks like a duck... in >29 faktorovich: "Do you think the sort of post you have added here should be highlighted by LibraryThing instead? Maybe something that jumps between anti-Semitism, nonsense and insults against a woman's looks?"

Her response to susanbooks in >184 faktorovich: "Your argument about Nichols being a "dimwit" is extremely biased and irrational. What proof to you have for his stupidity, or for his isolation in an attic, and if he was isolated in an attic, don't you see a similarity there between this biographical point and the mad woman in the attic in "Charlotte Bronte's" Jane Eyre?"

(... numerous other examples ...)

>355 faktorovich: "But as a woman without any connections; I am not going to get any funding to develop this program or to market it to researchers. And even if it was the most brilliant program in the market; researchers like you would consider my gender a disqualifier and would refuse to use it even if it was free."

(... numerous other examples ...)

Her response to anglemark in >536 faktorovich: "without at least giving your name, so that your previous publications in this field can be verified. Without this information, you can be either puffing your own articles when you recommend "Leech... Short" and the other pieces you cite, or you can have no actual credentials in this field, and merely saying that you are qualified to give your opinions the appearance of superiority."

to amanda4242 in >570 faktorovich: "the typical strings of insults you probably launch at every researcher to discredit theories that rival your own"

She's demonstrated a pattern of responding to misunderstandings and criticism by assuming the other party is prejudiced, nasty, arguing in bad faith, out to get her, biased, falsifying their data, ... and then treating the other party as if that were totally true. There's really no point in trying to convince her, because you'll play by the rules of good-faith arguments, and she won't. And she'll use her bad-faith assumptions to ignore criticisms, and then she'll feel justified in spewing her garbage again, because it hasn't been appropriately addressed, or something.

582lilithcat
Dic 22, 2021, 5:40 pm

>581 Petroglyph:

You know, my high school French teacher taught us a saying, "Qui s'excuse, s'accuse". I think it fits.

583faktorovich
Dic 22, 2021, 5:48 pm

>580 Petroglyph: The frequency with which you use the term "garbage" only approaches the rate in "Annual report of the Department of Health of the State of New Jersey. 1893-94", which keeps referring to the cost, disposal, and other topics related to "garbage". However, the term is relevant in the discussion of waste disposal in this "Annual report", but is entirely irrelevant in any response to a discussion about computational-linguistics or literature. "Garbage" is the "wasted or spoiled food and other refuse", or in computing, "unwanted data in a computer's memory". "Garbage" can have value, for example when unused food is given to the hogs, or when "garbage" is recycled and made into new products. Every human creates "garbage" when they consume food, and leave bits unconsumed (packaging or crumbs). If you are using "garbage" as a metaphor; then, research is the process of consuming information, and the result of consumption is a combination of energy, feces and "garbage". In research, these outputs are a book that explains the subject from a new perspective (energy), the unused research that remains unpublished (indigestible fiber), and the materials that were not considered because they were deemed to be irrelevant ("garbage"). Since I have published 14 books so far in this Renaissance series, I have created an enormous quantity of Energy as the main output of my research, and I have not shared either the feces or the "garbage" that were generated during my research with the public. Thus, how have you come across these piles of "garbage" that I have left out of my books, having done my best not to mention it to avoid upsetting young minds?

584Petroglyph
Dic 22, 2021, 5:50 pm

>582 lilithcat:
Yup, that sums it up.

585Petroglyph
Modificato: Dic 22, 2021, 6:09 pm

>583 faktorovich:

ETA: just more deflecting, more avoidance. Pathetic is what it is.

586susanbooks
Dic 22, 2021, 11:38 pm

>580 Petroglyph: Petroglyph: The frequency with which you use the term "garbage" only approaches the rate in "Annual report of the Department of Health of the State of New Jersey.1893-94"

Whoa! Petroglyph, you've been ghostwriting that long?!? Can you send me a list of the vitamins you take?

587Petroglyph
Dic 23, 2021, 2:05 am

>586 susanbooks:
I'll bring some the next time our LT ghost-writing workshop meets. Do you like licorice-flavoured tablets?

Wait till she finds out several of us here can write in more than one language...

588MarthaJeanne
Dic 23, 2021, 2:19 am

>587 Petroglyph: Können wir das? Wir können sicher in mehrere Sprachen lesen.

589spiphany
Dic 23, 2021, 3:14 am

Because apparently I don’t have anything better to do with my time (or procrastinating on that translation project is more attractive than actually working on it), here is a list, in no particular order, of some of the reasons I can think of that would affect how similar two texts are, regardless of authorship, and some of the linguistic markers involved:

- editorial influence (punctuation, other formal aspects such as spelling, writing out numbers or using numerals, etc.)
- genre (e.g., an essay is probably going to have a higher rate of relative clauses and passive voice, a playtext would likely use more features of spoken language, ellipses, colloquial and informal phrases, etc.)
- text-specific characteristics (different frequency of particular pronouns in first-person vs. third-person narration; greater frequency of quotation marks in dialogue-heavy fiction than in descriptive passages)
- dialect and idiolect (specific words, phrases, or grammatical constructions that are relatively uncommon in the corpus as a whole, but more frequently used by one speaker or group of speakers)
- formulaic language (common, e.g., in certain poetry genres and oral traditions, where use of set phrases or images is a compositional technique and/or a part of the storytelling ritual)
- extensive use of quotations from other authors or traditional sources (i.e., some of the text does not represent the author’s own language usage)
- imitation of other authors or text styles
- style of writing instruction (i.e., authors who have all been taught to write a certain way may resemble one another more than authors taught to follow a different set of rules or authors encouraged to experiment and find their own eclectic style; it was usual at some periods of history for composition to be taught in terms of strict imitation of classical models)

590spiphany
Dic 23, 2021, 3:24 am

And since nobody has brought this up yet:

I’m having a great deal of trouble imagining any reason why it would make sense for your “workshop” of six writers to publish their works under the identities of dozens of real persons who were otherwise not active as authors, or why literary scholars in the centuries since have assigned the publications of your “workshop” to these dozens of persons if there were not some evidence that they had, in fact, written some of these works. It is equally hard to believe that such a workshop has remained completely undiscovered until now, or that your six such prolific writers left no documentary trace (correspondence, appointment books, diaries, etc.) of their participation in this literary production machine.

Please note: I am not denying that some works of the English renaissance were published under pseudonyms, or that other works were published anonymously. There are lots of reasons why doing this probably made sense in specific cases. Censorship was a thing, and certainly some authors would have risked political repercussions if they had published certain views under their own names. Likewise, in some contexts (for example, literary magazines, from what I understand) it was usual for texts to be printed without authorial attribution or with only initials to indicate authorship.

None of this is indicative of the sort of systematic and widespread misattribution of authors that you claim to have uncovered.

When pseudonyms are used to protect an author who is writing on sensitive topics, it is common for these pseudonyms to be obviously made-up (or at least, not to correspond to a real person). Likewise, attributing a work to someone other than the actual author may be done for reasons of prestige -- because the person to whom the work is attributed is more famous or more authoritative or has a recognizable “brand” that can be used to market the work. Sometimes the opposite happens -- i.e., a work is attributed to someone of lower social status than the author -- in order to present the text as an “authentic” example of unspoiled traditional storytelling or workers’ voices or whatever.

Again, these are all cases where authorship is obscured for specific texts for specific reasons. It doesn’t explain why a handful of writers would publish an entire vast and varied body of work under a variety of false names, or why they would go to the effort of finding real people of diverse backgrounds from across England to whom the works were to be attributed.

What possible purpose would this serve? How would anyone benefit from this?

591MarthaJeanne
Dic 23, 2021, 3:30 am

>590 spiphany: The other question is how this group could have prevented anybody else from publishing.

592Petroglyph
Dic 23, 2021, 8:58 am

>588 MarthaJeanne:
Können ist ja nicht die Frage -- eher, dürfen wir das? Werden wir beschuldigt davon, uns im Geheimen lustig zu machen? Schaut sie uns ihre Übersetzungsfähigkeiten? Nur die Zukunft weiß...

>591 MarthaJeanne:
I'm sure her book goes into great conspiratorial detail on this matter. I can report back when I get to that section. Realistically, though, we can probably wait for a copy-paste of the relevant pages.

593susanbooks
Dic 23, 2021, 10:15 am

>591 MarthaJeanne: "The other question is how this group could have prevented anybody else from publishing."

At any one time there are only 6 producers of writing alive. It's like the Vampire Slayer: you are called, must answer, & then produce. The rest of us may think we've produced writing but that's because we're under an evil spell that the Ghostwriters (Shazam!) fight to break in between -- even during! -- their copious writing projects.

594MarthaJeanne
Dic 23, 2021, 10:23 am

Don't know the Vampire slayer. Obviously I've been too busy reading books for the past few decades.

595scaifea
Dic 23, 2021, 10:28 am

>593 susanbooks: And just when I thought this thread had it all, someone makes a Buffy reference. Perfection! *continues to munch popcorn*

596spiphany
Dic 23, 2021, 10:45 am

>593 susanbooks: Alright, I suppose that since there are only Seven Basic Plots, why shouldn't it be only six people who are responsible for writing them? (Wait -- that's not very elegant, is it? Who goofed and created an extra plot, forcing one poor writer to double up?)

597susanbooks
Modificato: Dic 23, 2021, 10:47 am

>595 scaifea: I'll be Dark Willow. Bored now.

598susanbooks
Modificato: Dic 23, 2021, 10:55 am

>596 spiphany: There's always the Lame Ghostwriter, the Ghostwriter who has so little talent that the others have to rewrite his/her work almost completely. That Ghostwriter is free to work any plot since very little of his/her work remains in the completed piece. For instance, I have a 20 volume series on how MacBeth, begun by the Lame Ghostwriter of that period, started as doggerel about how hard it is to find good soap. Damned spot! (Less erudite scholars believe it began as a treatise on canine extermination. Fools!)

599scaifea
Dic 23, 2021, 11:01 am

>597 susanbooks: Yes!! Can you imagine how many times Giles would have cleaned his glasses in exasperation during this thread?

600paradoxosalpha
Dic 23, 2021, 11:05 am

>598 susanbooks:

I love the Lame Ghostwriter. It sounds like something out of If on a winter's night a traveller.

601bnielsen
Modificato: Dic 23, 2021, 11:16 am

>595 scaifea: It's been a while since a math reference, so:

>588 MarthaJeanne: and >592 Petroglyph: Wir müssen wissen. Wir werden wissen.

602susanbooks
Dic 23, 2021, 11:11 am

>599 scaifea: Well, obviously this thread is under the control of a Writing Demon. Even smaller and more belligerent and self-confident than a Fear Demon (Don't taunt the Fear Demon!), the Writing Demon is an unpleasantly irrational, repetitive being who produces words & words & words & words, all tending to nowhere. This is indeed the thread for glasses wiping! And research -- to the library!

603susanbooks
Dic 23, 2021, 11:13 am

>600 paradoxosalpha: Calvino knew all about this stuff. His thousand-volume work on ghostwriting, alas, remains undiscovered.

604bnielsen
Dic 23, 2021, 11:21 am

>602 susanbooks: Among the many books present, but not yet found in "La biblioteca de Babel", I'm sure.

605MrAndrew
Dic 23, 2021, 11:50 am

>591 MarthaJeanne: I suspect the dead hand of the Illuminati. To the Vatican!

606SandraArdnas
Dic 23, 2021, 11:59 am

This thread has become pure gold. Who would have thought it possible? :)

607spiphany
Modificato: Dic 23, 2021, 12:16 pm

>606 SandraArdnas: I'm sure there are a few alchemists out there (Doctor Faustus perhaps?) who would be happy to hear that -- and the transmutation didn't even require an intervention by Hermes Trismegistes.

608faktorovich
Dic 23, 2021, 12:20 pm

>589 spiphany: I have already addressed most of these points in this discussion, but I will restate and summarize my position here. Overall, I have considered all of these potential influences when I was designing the 27 tests that I applied to the Renaissance corpus, and judged the impact of these would only help to identify the authorial signatures, instead of distorting them.

- These ghostwriters edited each other and at least 3 of them were also publishers/printers/typesetters who made editorial adjustments. If major editorial changes were made these can register as a tertiary or a minor match to the editorial ghostwriter. If the changes are minor, they do not register on the tests; and this means their contributions were so small they should not be credited as author(s) for a given text. If the editor (re-)wrote so much of a given text that his or her signature registers as the dominant this means whoever is being edited thus cannot be considered as the author, but rather as an inspiration or the like for the text. I avoided punctuation marks such as em-dashes and dashes because errors with these can be introduced during digitalization or indeed by a typesetter who is not an author and just makes an error of using a dash when an author used a semicolon or the like. The handwritten versions of these manuscripts usually have the same types of spelling/ punctuation errors as the printed editions (I checked this while translating "Captain Underwit/Country Captain: Volume 14"), and this means that the typesetters almost never introduced new errors. The 7 copies of one of these books I translated in this set had slightly different mispellings, which indicated the typesetter deliberately flipped over letters like "u" to "n" etc. as a joke, and for this the typesetter probably had to be one of the writers, or he would not have seen the joke in this.

- No, the genre does not change the authorship attribution. For example, Verstegan ghostwrote both the masque, "Munday's" "Banquet of Dainty Conceits" (16.6 passive) and the non-fiction "King James"/ "Bancroft" "Bible" (20.6 passive) - this might seem different, but there are only 13 other texts between them out of 284 texts across the full range for this test. They are also similar on most of the other tests, including syllables-per-word where they are near-identical at 1.36 and 1.35. There are only a few measures where they are different such as semicolons, at 0 in the masque and 34 in the Bible, but this is why the combination of 27 different tests means that a glitch in any one of the tests does not impact the overall attribution of 2 texts as similar.

- Again, frequency of first vs third-person pronoun usage is itself a linguistic preference that helps to identify an author. Some writers are more comfortable using the first or the third person, and avoid the other voice. Some writers might be comfortable in both. These 3 different choices would register as at least 3 different style elements, and the variations in these preferences strengthen as different aspects are measured on the 27 tests. I do not test for apostrophes or quotation marks in the Renaissance because they are used inconsistently because of digitalization errors/preferences of typesetters over time etc.

- In volumes 3-14 (and the forthcoming volumes 15-28 or so) I explain thousands of occurrences of rare words that only appear in Percy's or one of the other ghostwriters' texts, and not in any of the texts from the other ghostwriters' groups. I also explain if there are overlaps as one ghostwriter introduces a word he invented, and then others also begin to use it. This is part of the reason I started doing the translations, to show proof in individual word usage, as well as in story structure etc. Most of these are not based on dialect, but rather on these ghostwriters adding new words they are inventing to dictionaries, or to the texts as new words become necessary. There are also a few examples of Scottish, French, or other dialects being used by characters, but these are relatively rare, and these small linguistic tricks do not influence the overall signatures.

- The use of repeating phrases in poetry is common in the Greco-Roman period, but not in the British Renaissance. A few poems or pieces might have unusually frequent phrases that are part of the formula, but these are rare exceptions and if a full text with over 10,000 words is tested, there are no cases I have seen where these repetitions impact anything other than perhaps the 3-word-phrases, which are not one of the quantitative 27-tests, so they do not impact the attribution.

- I have taken out all prolonged direct quotations in foreign languages and in English when they were easily noticeable as I cleaned up a text. The quotes that are not easily noticeable are usually translations from Latin or another language, and these translations are always so weakly based on the original that the translator's signature become the obvious dominant signature for the piece. Quotes from proverbs are common, but this preference to quote from certain proverbs is itself a linguistic habit that accurately identifies the user if this habit is overused, or this element is not noticed by the tests if there are only a few small quotes per-text. Most proverbial quotes are re-written in new versions each time they are reused in the Renaissance, and these re-writings have the linguistic traits of the text's main author and not of the original proverb writer. And in some cases these ghostwriters first wrote a book of proverb variations, before re-using these in new versions in their later texts.

- It is possible to imitate the sonnet or the madrigal poetic formula, but not the unconscious habits (such as percentage of nouns).

- Any professional ghostwriter would disagree with the assertion that a substantial quantity of texts (hundreds) can be created without the ghostwriter having to do their own historic/ literary research (for each new topic), their own experiments in formulaic variations (to avoid texts being spotted to be obvious plagiarisms), their own linguistic research (adopting new words from foreign languages, or finding rare words in dictionaries that apply to strange descriptions), etc. Even 2 students in a strict English class who are asked to write a sonnet each (about love, with specific plot-movements etc.) cannot come up with linguistically matching sonnets, not only because of their different life-experiences, and because one might have been studying new vocabulary more than another, but because of their distinct character traits etc.

609susanbooks
Modificato: Dic 23, 2021, 12:41 pm

I'm reading The Tenth Muse, a novel by Catherine Chung & this sentence about the narrator's time in grad school made me think of this thread:

"Whereas before I had been cautious, preparing for hours before I put forth a thought, I learned now to be flexible, to throw out ideas to be questioned relentlessly so they would become more robust as they developed" (87).

A less enlightened mind might think I was making a connection with things we've all said earlier, about how scholarship -- knowledge -- is created through contestation, through ideas rubbing up against each other, rather than one lone voice repeating itself over & over with never a glance (horribly mixed metaphor) around.

But, having been educated by this thread, I noticed, by my English-PhD-level math, that the word "to" (2) occurs 3 times in that sentence, clearly indicating that Catherine Chung did not write the novel I'm reading but instead that John Adams, the 2nd President of the US of America, ghostwrote this book. Further analysis reveals that "I" also occurs thrice, an obvious reference, coupled with the revelation of John Adams' authorship, to the poem "I, Too*, Sing America" by Langston Hughes. Thus I have proven that this putative 2019 novel was actually ghostwritten by John Adams and Langston Hughes. QED

Reading for content is for rubes!

* to, 2

610Marissa_Doyle
Dic 23, 2021, 12:45 pm

>602 susanbooks: Ah! Do you mean the great Titivillus? https://en.wikipedia.org/wiki/Titivillus

611faktorovich
Dic 23, 2021, 1:22 pm

>590 spiphany: The use of pseudonyms was a legal necessity. Verstegan and Harvey began ghostwriting for Sir Vere and Elizabeth as secret-secretaries; obviously they could not put their own names as co/authors of letters that were being sent out to foreign dignitaries/ aristocrats etc. Then, Elizabeth approved monopoly patents such as the music/poetry patent that Byrd received in 1575, and Verstegan probably gained the textbook patent under a pseudonym. The patent meant that all who published music had to receive a permission/license from Byrd. If Byrd had only published song-books/poetry under his own byline and using only himself as the printer, this would have obviously offended rival printers/writers who would have filed legal complaints over this monopolistic practice. So Byrd learned to avoid this by having underhanded control over a network of printers (which could have been just his pseudonyms), like "Morley" and "Barley". Harvey and Verstegan, meanwhile, started getting traditional ghostwriting projects from politicians, lawyers, aristocrats who paid them to write books with their bylines on them; this was normal because aristocrats had been using acting troupes, singers, poets etc. to write songs in their honor/ glorify them/ add fame to become more recognizable to the monarch/government and thus to rise to higher aristocratic/ government positions. Making money from selling the books themselves was so unprofitable (so little demand from the illiterate British public) that Byrd had to ask for land/money grants from Elizabeth to keep his business going within a year or so of getting the initial patent. They were all pretty far into these schemes before a few complaints began to surface of plagiarism or reprinting of texts under multiple bylines. At this point, they realized that each case of them using a pseudonym was legally considered to be fraud, so confessing to ghostwriting was not an option, as it could have led to extremely sever punishments for the ghostwriter and the contractor. At some point books did start selling, especially to university/school students that were required to buy them for courses. This new popularity of literacy meant that aristocrats/ monarchs/ merchants etc. started to take offense to negative things said about them in these books, and started filing libel/ treason lawsuits against whoever was named in the books bylines or their official named printers/booksellers. For example, Jonson had several treason/libel lawsuits brought against him, and Verstegan was exiled over publishing a treasonous pro-Catholic book in 1581. So the Workshop's writers began to prefer to just publish most of their books anonymously across the following decades. But they learned that creating quarrels over attribution (without actually hinting at their own identity) could generate more book sales, as was the case with the Marprelate debate where they used multiple obviously fictitious bylines who cast blame at other fictitious bylines etc. And they could hint at a given aristocrat's/merchant's authorship by using their initials, and then soliciting them to pay them to put their full name in a byline of the next book they might write, as even suspicion of their authorship might have generated fame via puffing mentions in the press. Using initials also attracted readers better than entirely anonymous texts because readers prefer to imagine they are befriending a specific person in the author. Meanwhile, several people had been executed over the Workshop's Catholic/anti-Catholic, Marprelate/anti-Marprelate, pro/anti-Elizabeth (effigies) etc. writings, so confessing to this whole scheme by 1590 would have been suicidal, as all of them would have been immediately executed for treason. While Sylvester, Harvey and Verstegan largely made a living by being hired as ghostwriters by wealthy patrons; Percy had a different problem. As I explained Percy's uncle was executed, his father was assassinated while at the Tower, and his brother was imprisoned in the Tower for decades; if Percy had used his own byline back in 1584-5 when he first started writing plays, he would have potentially put his father in still greater danger of execution, and since his father was assassinated in prison, these fears were justified. For example, Percy's "Fedele" (Volume 9) includes effigies that hint at the effigies of Elizabeth sitting on dung that had been found in London a bit earlier; this insult against the Queen would have probably led to William's execution together with his father just for this one play alone and while he was still probably just starting college. Thus, Percy had to use the "William Shakespeare" pseudonym in 1594 when he invested a part of the L2,400 loan he received back in 1593 in building the official theater duopoly that Elizabeth granted to 2 troupes (Chamberlain's and Admiral's). If "Shakespeare" had existed as a real person; he would have taken the money for himself; whereas if the "Shakespeare" signature was forged by Percy, he could use it to profit, but also could not be caught using it for fraudulent transactions, since this was not his actual legal name. The one poetry collection, "Coelia" (Volume 3) Percy published under his own byline was pretty controversial as well, but only 3 or so copies of it were printed, and all were probably kept by Percy, so it avoided causing offense, and showed to Percy his name was not likely to attract buyers. To recuperate the massive L2,400 loan that a merchant (probably an affiliate of Sylvester's, who was a secretary to the Merchant Adventurers) gave Percy, it appears that Sylvester and Percy set up the usurious lending scheme against play books as commodities that allowed them to take money from investors who had read pufferies (written by the Workshop) that suggested some plays saw as many as 10,000 audience members per-show, and believed that by hiring a ghostwriter and paying the troupe to stage a performance, they were likely to recuperate not only this sum, but a great deal in profits after receipts were counted. Since there was an extremely small number in the public who actually attended plays, most of these investments failed, but the ghostwriters etc. got to keep the funds for the work. If Percy used multiple pseudonyms it was not apparent that he and Jonson became the dominant dramatic ghostwriters between 1580s and 1642; and by using pseudonyms Percy could hire himself as the manager of troupes in charge of selecting new plays and then could only choose his own and Jonson's plays. But most of the confusion in later re-attributions is that "Henslowe's Diary" recorded the names of the lenders/investors in this scheme, and not the "authors" - the payments are specified to be "lent" money, or invested, but scholars have been reading these lines as income for these authors, and not as money borrowed at usurious interest rates, or money invested and never returned.

The problem of misattribution thus started during the Renaissance, and then a few decades after Percy's death in 1648, Oxford scholars (such as Anthony-a-Wood) began adding fictitious biographies to some bylines that might have been used only once on a text, or might never have appeared on any text but they belonged to aristocrats etc. who seemed like they could have been writers. These biographical dictionaries etc. either confessed that they were based on "rumors" alone, or were "finding", i.e. forging, letters (they did not reproduce, or were saying for which the original handwritten versions had been destroyed). Wood and others were paid large sums for this "scholarship", at a time when there were probably more authors than jobs available for authors. And across the following centuries, most scholars who proposed re-attributing an anonymous or questionable text to an(other) author were published in respected scholarly journals. And books that were initially anonymous that catalogers added assumed bylines to became more valuable at rare-book-sales. The whole thing snowballed into the nonsensical attributions we have ended up with today - I explain the absurdity of the current errors across the series. And now computational-linguists are testing a few books here, or a couple dozen books there, and their data is clearly showing clusters of similar texts, but they are ignoring the similarities if the current byline-attributions do not match the data, or they re-attribute some text from one byline to another based on similarity between them without considering that they are also similar to texts with a dozen other byline-attributions if the corpus is expanded to 284 texts+.

There is overwhelming volumes of evidence these 6 ghostwriters left behind, including over a hundred books with handwritten annotations on them from Harvey's library, dozens of Jonson's letters and three Percy letters (at a time when any surviving letters from an "author" are rare), Byrd's fiscal/ letters etc. documents that confirm this narrative, Sylvester's handwriting (in his self-attributed letter to James I) matching the handwriting style of both "Mary Sidney" and "Philip Sidney" etc., etc., etc. I discuss the various pieces of documentary/ confessing evidence across the book. Some of their letters specify that a lawyer is accompanying the letter who will explain the details, and this is why they are not more specific; the frauds were too convoluted to manage them without legal assistance.

I am first to discover this problem because: 1. computational-linguistics tools have only become available in the past 3 decades or so, 2. other computational-linguists are computer programmers and they are not familiar with publishing/ghostwriting tricks and to them the data might appear to have errors in it, when its just showing the unlikely case of only 6 ghostwriters in the Renaissance, 3. I have absolutely no self-interest in preserving the current attributions, 4. I have had years of free time to work on this problem because I am self-employed as a publisher.

612Keeline
Dic 23, 2021, 2:29 pm

>608 faktorovich: This passage jumps out at me in reading this:

... and this means that the typesetters almost never introduced new errors. The 7 copies of one of these books I translated in this set had slightly different mispellings, which indicated the typesetter deliberately flipped over letters like "u" to "n" etc. as a joke, and for this the typesetter probably had to be one of the writers, or he would not have seen the joke in this.

This indicates to me that you have never been involved in hand-setting type. These days it is more of a hobby so it is not too surprising that you have not. But even so, you should be at least aware of the process. In case you are not let me mention a few of the steps.

Letters of type are added to a composing stick from designated compartments in the type case. A particular character is supposed to be in a particular compartment so the typesetter can grab the desired letter from muscle memory (much like touch typing).

The lines of type are locked into a chase and the print made as many times as desired.

However, since the type, usually made from an alloy of lead, tin and antimony is heavy and a moderately expensive resource, most printers need to reuse the type for other pages.

Type is "distributed" by returning the letters to the compartments. Often this is done by a younger or less-experienced member of the staff since it is an apprentice role. The type is a mirror-image of what is printed so it is very easy to confuse certain letters and this includes "u"|"n" "d"|"b"|"p"|"q" and other combinations. Indeed, one of the origins of the expression "mind your p's and q's" comes from this aspect of typesetting.

The result of this is that it is very easy for a similar-looking letter to be in the wrong compartment. The type has grooves to help orient it to the baseline so it can be positioned right side up without looking. However, when the wrong letter is in the compartment, this might not be noticed until a proof print is pulled. At this point corrections can be made if there is time and if the error is noticed.

This was not a clever joke on the part of typesetters. It was a byproduct of the whole process.

The typesetting machines of the 19th C such as the Linotype removed a lot of this particular problem because the lines of type were melted and new ones cast from the keys struck on the keyboard. The wrong key can be pressed easily but the matricies for each letter are returned to precisely the correct location based on the clever system of notches in the brass matrix.

To say that "typesetters almost never introduced new errors" is something with which I cannot agree.

James

613faktorovich
Dic 23, 2021, 2:56 pm

>612 Keeline: In fact, while researching the manner in which books were printed, I followed the standard method to create my own quarto by cutting, bending and place page text on a little book to understand if this might have been the reason some letters were upside down etc. I explain this experiment and what it indicates in one of the translation volumes where I explain the different possible/ present typesetting glitches. I have also been designing/ typesetting books for my Anaphora Literary Press for 13 years, including over 300 titles I have personally typeset/ designed.

The errors where letters were upside down "u" vs. "n" etc. were changed haphazardly or different letters were changed in each new printing, instead of errors being corrected - new errors were introduced, while old ones were fixed. This is why, as I have explained, these were clear typesetting jokes. This has nothing to do with the process of how the letters were added to the template/print-page to be used in printing.

There is no way to verify if the writer was also the typesetter, the editor etc. and self-created his own book, or if there was an army of 100 typesetters etc. working at a single print shop. The provable evidence is who the final text matches linguistically, and who was registered as the business owner of the specified print-shop, and if there is any chance this owner's name was a pseudonym (if there is no evidence of a birth/death date etc. to support this name's existence in real London). Thus, you are imagining "younger" apprentices setting type, when it could have instead been set by the author himself. You cannot argue with computational-linguistic evidence by disputing that you imagine how it can be wrong because you have imagined an army of young typesetters.

614anglemark
Dic 23, 2021, 4:14 pm

>608 faktorovich: You said: "For example, Verstegan ghostwrote both the masque, "Munday's" "Banquet of Dainty Conceits" (16.6 passive) and the non-fiction "King James"/ "Bancroft" "Bible" (20.6 passive) ..."

Do I understand you correctly? Is your claim that there was only one "ghostwriter" /author responsible for the entire King James' Version of the Bible? How do you reconcile that with the existing scholarship on the KJV, which has identified 45-50 people working in different groups to translate the various books of the Old and the New Testament? Which version of the text did you analyse?

615Petroglyph
Modificato: Dic 23, 2021, 6:18 pm

>614 anglemark:
From "Volumes 1-2" (696pp.):

Harvey and Verstegan cooperated on some of their longest projects. One of their collaborations where Verstegan took the lead is the translation of the Holy Bible, Containing the Old Testament, and the New (790,021 words; 1611), which is known as the King James Bible. (p. 87)

But “Bacon’s” sermons were probably predominantly ghostwritten by Verstegan, who wrote all of the tested rhetorical-sermons. Verstegan attributed a few of his projects to workshops of scholars, as he did in claiming that “Richard Bancroft” and a group of university scholars and clerics jointly translated King James Bible (1611). (p. 179)

Harvey would have multiplied his profits from Misfortunes by attributing it to several lawyers instead of any single byline, just as Verstegan would have multiplied his profits by assigning the credit for King James Bible to a collaborative of multiple theologians instead of merely the recently deceased “Richard Bancroft”. (p. 180)

When [Verstegan] was not engaging in revolution, he was quietly ghostwriting the King James Bible, Pope Gregory XIII’s Calendarium Gregorianum (Latin for the Gregorian Calendar), and crafting elegant speeches for Elizabeth I and anti-witchcraft propaganda in Demonology for James I. (p. 244)

the poem [Venus and Adonis] includes a Jonson-sized frequency in its three occurrences of the insult “fool”. This rate is only seen in the Jonson’s “Shakespearean” comedies. “Fool” is near-absent from “Shakespeare’s” poetry ghostwritten by Sylvester, Harvey and Byrd, and is much less frequent in the Percy’s tragedies. Jonson’s Twelfth Night includes 80 occurrences of the word “fool” out of 21,319 total word-count. Curiously, the Verstegan-ghostwritten King James Bible includes 199 occurrences of “fool”, but this count is out of 790,021 words (the Bible is 37X larger, but includes only 3X more “fools”, so the rate is .4% in Twelfth, compared to a microscopic .025% in the Bible); a similar pattern repeats in other texts in the Verstegan group, with 3 occurrences in his self-attributed Declaration, 4 in “Bancroft’s” and 10 in “Playfere’s” sermons, and 6 in Just Censure. Most importantly, there are no occurrences of “fool” in the aristocratically and royally bylined texts including the “Vere” letters, “Elizabeth’s” speeches, and “James’” Demonology. Verstegan found the usage of the slang word “fool” distasteful and he particularly avoided it when writing under politically-charged bylines. (pp. 261-262)

Because this text is 790,021-words long, it had to be split into three sections to run it through four of the tests, as the software programs could not process this enormous volume of data in the entire file. (p. 292)

Goddamn amateur hour.

The average syllables-per-word count for the three sections was near-identical (1.34, 1.36, 1.35); this is an outlying similarity because the range for the 284 texts for this test is between a low of 1.22 and a high of 1.65. This result clearly indicates these parts were all written by a single individual. The top-6-words were also identical in all three parts of the Bible with a repeated pattern-f (the, and, of, that, to, in). Even most of the top-6 3-word phrases repeated across the three parts. [...] “of the lord” and “the son of” repeated in all three parts. [...] Out of the 284 texts, “of the lord” only appears among the top-six 3-word phrases in the King James Bible and in the Verstegan-ghostwritten “Playfere” sermons; the absence of this phrase among common phrases in all of the other texts, makes it extremely likely a single author wrote both of these texts. (p. 293)

While it is conversationally known as The King James Bible, the King’s name does not appear in the byline on the first edition’s title page. (p. 293)

... Nowhere does the King James version claim it was authored by King James. This is, it is implied, suspicious. You can't make this up!

616Petroglyph
Dic 23, 2021, 6:20 pm

>615 Petroglyph:
What a load of bollocks.

The whole book is like that: a 696-page illustration of GIGO (Garbage In, Garbage Out).

617faktorovich
Dic 23, 2021, 6:27 pm

>615 Petroglyph: This is nice. Petroglyph is going to find the relative sections and quote them for you guys as questions come up. Fantastic. I can focus on my translation of Jonson's "Variety". "James'" name is listed as the entity whose "Commandment" ordered the creation of this 1611 Bible, and it was printed by King James official printer, as the title-page specifies. These elements have given this version the title "King James Bible", which most sources today tend to interpret as "King James' Bible" even the apostrophe is not in the title. This is a minor point. I did not say that the title-page says "James" authored it. This distinction is only significant in comparison with the bylines of the firmly attributed to "James" today "Daemonologie" (pro-witch-burning): for which the first edition was credited as “Printed to the Kings Majestie”: this indicates the book was made “to” rather than “by” James. The later London edition is credited as: “By King James VI of Scotland (King James I of England)”, printed in Edinburgh. “James R.” does appear at the end of the “To the Reader” opening remarks, but also without a “by” line.

618Taphophile13
Dic 23, 2021, 6:27 pm

>6 Crypto-Willobie:
What a load of bollocks.
and
>616 Petroglyph:
What a load of bollocks.

Aha! Crypto-Willobie = Petroglyph

619Petroglyph
Modificato: Dic 23, 2021, 6:33 pm

>618 Taphophile13:
Well, if you accept that the KJV and a few bible-quoting sermons have to be written by the same person because "of the lord" is among the top six most frequent three-word phrases, then that is a perfectly cromulent conclusion to draw.

I'll post a confirmation under the Crypto-Willobie moniker later.

620faktorovich
Dic 23, 2021, 8:16 pm

>619 Petroglyph: Out of the 600 words you quoted about the King James Bible above, you have picked only a 3-word phrase, and you are arguing that this is the only piece of evidence that led to my attribution of this title to Verstegan, ignoring all of the other 597 words in your own quote, or the thousands of other words I wrote on this topic in Volumes 1-2. An equivalent would be if you asked me to show you my pet elephant. I brought out the elephant into the room. You closed your eyes, nose and all other senses, and just touched a piece of the elephant's skin. Then you said that you know very well that only a bit of rough skin does not prove there is an elephant in the room, and that I have made an erroneous "conclusion" by insisting there is indeed an elephant in the room.

621Petroglyph
Dic 23, 2021, 8:47 pm

>620 faktorovich:
a) Your garbage debunks itself.
b) Picking on jokes because the meatier posts are harder to push back on?
c) I guess the thousands of words I've posted upthread don't count for anything in your view. Not even an imaginary elephant. Good to have that confirmation.

622guido47
Dic 23, 2021, 9:31 pm

Hi #600,

Yes I do enjoy this phantasmagorical saga. But I do not like popcorn, but do like Buffy (also my shelter Cats name)

But...But... I would just like to thank you for that interesting book suggestion. I will definitely be buying that "Calvino" book.

A strange author, a bit like this thread. Should it be split by now?

Guido.

623SPRankin
Dic 23, 2021, 10:42 pm

The true author of the KJV? More like the HOLY Ghostwriter, amirite? Hey-oh! I’m here all week, folks. Try the veal and don’t forget to tip your waitress!

624Crypto-Willobie
Dic 23, 2021, 11:52 pm

What a load of bollocks!

I got it right way back when.

625faktorovich
Dic 24, 2021, 1:21 am

>623 SPRankin: Yes, I mentioned the Workshop joked about the Holy Ghost-Writer earlier in this thread. Those references were made by Verstegan in other theological texts, so he was probably referring to his translation of the Bible (speaking for the Holy Ghost), in addition to his ghostwriting for several priests' sermons etc. This is why I use the "ghostwriter" term. I'm glad you guys are starting to understand.

626Keeline
Dic 24, 2021, 2:10 am

>613 faktorovich: You wrote:

Thus, you are imagining "younger" apprentices setting type, when it could have instead been set by the author himself. You cannot argue with computational-linguistic evidence by disputing that you imagine how it can be wrong because you have imagined an army of young typesetters.

No. That is not what I wrote at all. I mentioned that apprentices in the print shop would be responsible for distributing type. This is the process by which type is cleaned and placed back into the typecase compartments. It requires less skill and experience than setting type. But it does make the apprentices more familiar with the typecase layout.

If a "u" was accidentally placed in the "n" compartment, the next time the letter was pulled, the wrong letter would be used. No cleverness was involved. It was part of the process.

This sort of thing is largely unknown by people who have only done desktop publishing. You can have a million of any character you wish without a thought. When you delete a block of text or are done with the project, there is no effort to restore the characters for their next use.

The folding of large sheets to form folios (1 fold), quartos (2 folds), octavos, sextesimos, etc. did develop a standard layout for the pages (one definition of "imposing") so the printers would know where to place each page (also called a folio) and whether it was to be inverted. I have some story papers from the 19th C which are uncut and the layout is revealed in a very clear manner when it is opened up.

In this lengthy thread I have seen no reference to the Stationers Company records which provide a lot of information about who was doing the printing and often the writing as well. Some of these are more complete than any typical genealogical information for birth and death dates and locations. Some people can be traced better than others. Usually the more traceable people are the wealthy and/or royalty. Others may be represented by church records but these can be spotty, especially with the centuries of calamities that can destroy them.

Speaking of centuries, I keep seeing references to 400 years of placing William Shakespeare on a pedestal as if he was always thought to be one of the greatest writers of English literature. He was known in his day but did not hold that stature. The fact that there were pirated "quartos" of some of his plays during his lifetime and shortly afterward is a testament to this.

After all, it was Ben Johnson who collected his works, in his lifetime, and had certain followers. I think they were called something like the "Sons of Ben." No small ego there. Perhaps it was deserved and perhaps not. But almost none of his works are widely read or performed today and they are of a different character.

Shakespeare started to rise in stature between 100 and 250 years after he died — from the early 1700s to the early- and mid-1800s.

James

627faktorovich
Dic 24, 2021, 2:49 am

>626 Keeline: I explain the "u" and "n" deliberate typos in a section in the series. I don't think you are understanding what I am referring to in my brief summaries in this discussion. You should ask me for a review copy of the series (at director@anaphoraliterary.com), so you can read this entire section to catch my meaning. I am saying that in this specific instance the "u" and "n" were flipped deliberately, and you are saying that you think it was accidental and without "cleverness" without grasping the other errors that were introduced into these copies, while other errors were corrected etc. in an obviously international manner. And your distinction between an apprentice that cleans and sets type still indicates that in either case, you are imagining a fiction of that there were apprentices involved and what they did without any documentary or linguistic evidence that could prove what specifically apprentices did or did not do for this specific title.

There are many mentions of the Stationers Register (and Company) across my series. Yes, it does supply some records of the registering booksellers and in some cases the writers, but most of the entries are anonymous just like most title-pages. No, there are no birth or death dates, and certainly no extensive genealogical information in the Stationers' Register. There are some parishes with birth/burial records - maybe you are confusing these? Past scholars have assumed that poverty could have been the cause for the absence of some birth/death records, but my linguistic etc. data supports the conclusion that most bylines without clear birth/death records are very likely to have been pseudonyms without corresponding real people behind them.

The "pirated" "Shakespeare" and otherwise bylined quartos is a whole other topic, which I explain in several different sections, as scholars have used this term to apply to very different publishing/re-printing problems.

Yes, Jonson both puffed and ridiculed "Shakespeare", and Jonson was also the dominant comedic writer behind the "Shakespeare"-byline, so, as I explain in a section, when Jonson puffs "Shakespeare", he is really just puffing himself as he puffed himself directly as well. I explain the structural differences between Percy's and Jonson's "Shakespeare" plays in a section in the series, and in the structural "Shakespeare" patterns table in GitHub.

"Shakespeare" and Jonson were puffed far more than other writers of the Renaissance, and they both had the distinction of having extensive folios of their collected works published. There were very few such collections, a group that also includes the Jonson-dominant-ghostwritten "Fletcher" and "Beaumont" folio. Whenever "Shakespeare" rose in fame, he is now up there and so it's uniquely difficult to dislodge him.

628Petroglyph
Dic 24, 2021, 5:29 am

>627 faktorovich: "I explain the "u" and "n" deliberate typos in a section in the series. I don't think you are understanding what I am referring to in my brief summaries in this discussion"

There are no coincidences in conspiratorial thinking. There's always more context that can be spun to imply the preferred conclusion.

629prosfilaes
Dic 24, 2021, 12:54 pm

>576 faktorovich: Why don't you address my perfectly logical explanation regarding the similarity between Jonson's "Volpone" and the "Donne" poems

You claimed it was obvious, then when I said that it wasn't obvious, you said it was "co-ghostwritten by Jonson and Byrd" and said "this might lead to the conclusion that Byrd was the dominant hand behind "Valediction""; in which case there's no reason to think they would be similar.

And then you go into your 27 test system as if it were accepted. There has been considerable criticism of it from the inside, but I prefer to question it as a black box; we can put books in, and get a number out, and what that number means, whether it means anything at all, needs to be established by running it against a known sample and demonstrating it produces the expected results. Repeating over and over that your system comes to some conclusion we're skeptical of doesn't reduce our skepticism of that conclusion.

630MarthaJeanne
Dic 24, 2021, 1:29 pm

But those poems are not similar.

631faktorovich
Dic 24, 2021, 1:49 pm

>629 prosfilaes: I refer to the dominant ghostwriter in any given text as its ghostwriter in short-hand, without specifying the secondary, tertiary, or editorial contributions that were made to this text. This is necessary to avoid confusion and because the dominant ghostwriter has written most of the text in question, and thus should be credited as its main author. So, there is no contradiction between me simply saying that the "Donne" sonnets were Jonson's, and be specifying that they were ghostwritten by Jonson as the primary and Byrd as the secondary ghostwriter. Regarding "Valediction", maybe an example will help. Let's say there is chocolate batter made by Baker A and vanilla batter made by Baker B. If the whole cake is chopped up into tiny pieces and the chocolate and vanilla parts are separated into containers and these containers are weighed, one can determine that the cake in question is mostly chocolate or vanilla. But if we instead cut a slice of this randomly mixed cake, we might have a piece that is largely or entirely vanilla or chocolate, or an equal mix of the two etc. "Volpone" is predominantly Jonson's without a strong secondary, while "Donne's" sonnets also have Jonson as their primary, but with a strong secondary contribution from Byrd when the cake is measured as a whole. "Valediction" is likely to be mostly Jonson's because I found the erotic fragments in "Volpone" that use similar words/erotic language (heart/melt). Jonson typically prefers to include homoerotic, erotic, and otherwise sexually suggestive content in his texts, whereas Byrd tends to avoid such sexual details and instead uses flowery romantic language. "Volpone" is similar to "Donne's" "Sonnets" as a whole, but there are tests etc. on which they differ because (to use the analogy) "Volpone" is nearly entirely chocolate, while "Sonnets" have a mixture of mostly chocolate and a secondary mixture of vanilla. Is this clearer? There is no black-box out of which these conclusions have emerged; the precise data that led to these conclusions is available on GitHub for all to check.

The only way other computational-linguists can debate my conclusion is by running these 284 texts through their own methodologies and making all of their raw data available for the public to scrutinize to see if they find 104 linguistic signatures in this corpus that match the current byline-assignments, or if they would find the 6 linguistic-signatures my method has found, or some other combination of signatures. No other computational-linguist has attempted to test this large of a corpus from the Renaissance, and none have ever shared the raw data with the public, and none has used a combination of 27 different tests instead of just using the "standard" in this field word-frequency test. There is no rational reason to have skepticism in my results after looking over my series, and my data; there is every reason to have rational skepticism about the current established attributions of the British Renaissance.

632SandraArdnas
Dic 24, 2021, 3:37 pm

Your method has been shown lacking in this thread already. You just ignore it. Your conclusions are more often than not logically erroneous, so they are void even if the method were perfectly fine. You ignore that too. You pretty much ignore everything that does not fit your predetermined result. Once again, a travesty.

The vast majority of people here are highly educated in diverse fields. Even those who are not not are highly read and thus above average in comprehension of all sorts of things. So for the love of god stop insulting everyone's intelligence here with your pseudo-arguments, demagoguery and the similar. If it hasn't worked 50 times, it will not work the 51st either. Just look at those polls. They clearly tell you the opinion of literally everyone, every single person following this 'discussion'

633susanbooks
Dic 24, 2021, 4:50 pm

>632 SandraArdnas: Don't you know genius is never appreciated in its time?

634faktorovich
Dic 24, 2021, 9:51 pm

I am here to answer all questions that are raised about the research I presented regarding the correct attributions of the British Renaissance in my series. The internal or external agreement or disagreement among readers is a volunteer job for the reviewers who read my arguments. My volunteer job here is to give out free information to guide readers to understand this subject. If you have more questions, I will continue answering them.

635susanbooks
Dic 25, 2021, 12:55 am

Happy Holidays to everyone who has made this thread so fun.

636anglemark
Dic 25, 2021, 3:55 am

>635 susanbooks: Indeed. This thread has been a joy to read. Happy holidays!

/The Anglemark member who does not hold a PhD in linguistics

637Petroglyph
Dic 25, 2021, 5:50 am

Happy holidays! From all of us to all of you!

638scaifea
Dic 25, 2021, 7:15 am

This thread is 100% the gift that keeps on giving, even more so than a jelly-of-the-month club subscription. Happy Holidays, everyone!

639lilithcat
Dic 25, 2021, 9:41 am

Happy Christmas, all. Shall we have a group read of The Winter's Tale by whoever that guy was?

640FAMeulstee
Dic 25, 2021, 9:52 am

>639 lilithcat: If you mean Italo Calvino's If on a winter's night a traveller, I am in.

641lilithcat
Dic 25, 2021, 10:22 am

>640 FAMeulstee:

Are you sure that's by Calvino? I've heard that his books, as well as those by Umberto Eco, Andrea Camilleri, Natalia Ginzburg, and a host of others, were actually written by a small cabal calling themselves "Gli scrittori ignoti"

642SandraArdnas
Dic 25, 2021, 10:25 am

LOL

Merry Christmas everyone

643faktorovich
Dic 25, 2021, 10:27 am

>641 lilithcat: Are you trying to refer to Siena’s collaborative anonymous writing Workshop, Accademia degli Intronati, Italian for, the Academy of the Stunners, which peaked in prominence mid-sixteenth century, and is best-remembered for its comedy, "Gl’ingannati"?

644FAMeulstee
Dic 25, 2021, 10:59 am

>641 lilithcat:

No clue anymore what to believe about authorship ;-)
But I am sure about the title!

645SandraArdnas
Dic 25, 2021, 11:01 am

Correction, merry Christmas everyone and their ghostwriter

646paradoxosalpha
Dic 25, 2021, 11:31 am

This thread has reminded me of one of my favorite James Morrow stories: his sequel to A Christmas Carol, "The Confessions of Ebenezer Scrooge." A ghostwriter is haunting Christmas--the ghostwriter of Communism.

647faktorovich
Dic 25, 2021, 11:53 am

>646 paradoxosalpha: I think you meant to say "...The Ghostwriters of the Commune"?

648paradoxosalpha
Modificato: Dic 25, 2021, 1:27 pm

Well, no. I meant to write what I did write: Morrow is the unconcealed "ghostwriter" in the voice/byline of Scrooge. He has the Xmas specters return to demonstrate that Scrooge's philanthropy is only suppressing the necessary socialist revolution. It's a gem of a tale.

649bnielsen
Dic 26, 2021, 2:36 pm

Merry Christmas from Nicolas Bourbaki and myself.

650lilithcat
Dic 26, 2021, 2:40 pm

Nicolas Bourbaki

651faktorovich
Dic 28, 2021, 8:45 am

"Experience teaches us
That resolution is the sole help we need.
And this, my Lord, our honor teaches us:
That if we are bold in every enterprise;
Then, since there is no way other than to fight or die,
We must be resolute, my Lord, for victory." --"Locrine" (1595), "Newly set forth, overseen and corrected, By W. S.", included in "Shakespeare's" Third Folio (1664), ghostwritten by Josuah Sylvester

652Shookie
Feb 17, 2022, 7:53 pm

I gave this book a good review because I found it very interesting and not because I am swayed by the author's study. Perhaps this is not the proper way to review a book. I think many LibraryThing members are getting extremely involved in debating the caliber of the author's academic reasoning and that is a discussion that I won't even profess to being erudite enough to tackle.
I have, however, just read the author's interview with LibraryThing's own AbigailAdams. I am appalled at the arrogant audacity Dr. Faktorovich shows toward her fellow scholars, be it past or present. Indeed, there is an egotistic bent to the entire interview with Dr. Faktorovich never answering in what might have been a brief response but rather discussing her other books, the unrealistic (in my mind) amount of reading she does or her many translations, most of which are the first time the said manuscript was translated. Her constant use of the words "absurd" or "nonsensical" and "irrefutable", "obviously","claimed. . but "clearly" were offensive to me. And does it not seem a bit off that Dr. Faktorovich's "translations are accompanied by annotations, introductions and primary sources that add thousands of pieces of evidence that confirm the re-attribution made in the central study." Generally, when one strongly believes something they will find evidence for that belief everywhere or else not include dissenting information.
I have to admit that I am far more familiar with academic papers on medical subjects than those on literary subjects, however. I do not remember seeing any with this type of omniscient attitude. I am quite disappointed in Dr. Faktorovich as a researching scholar.

653faktorovich
Feb 19, 2022, 3:11 pm

>652 Shookie: My article, "Manipulation of Theatrical Audience-Size: Nonexistent Plays and Murderous Lenders", is forthcoming in "Critical Survey's" winter 34.4 issue. I am nearly finished with the translation of a second volume in the second half of this BRRAM series, which I returned to a couple of months ago. I'll post a giveaway for the remainder of the series in Early Reviewers when it is finished.

If BRRAM was merely interesting and not convincing, I would not have needed the three years the full series is going to take to complete, but rather I could have just written a shocking little article that made all sorts of unsupported assertions. Instead, the overwhelming evidence BRRAM's 28 or so volumes will provide should convince even the most staunch doubter who actually reads their content. If you rationally compare quantifiable elements of the "caliber" of the evidence provided by previous established scholars in the field of Renaissance re-attribution versus BRRAM; the rational conclusion must be that I have succeeded in raising the "caliber" to a new level of precision. Past researchers in this field are, simply, all wrong in their re-attribution conclusions. An equivalent example is if I had found the cure for cancer, and explained why previous researchers in that field were wrong in the drugs, surgeries and other methods they had previously advanced as curative; it would be irrational for critics to dismiss my findings simply because I had to not only explain the accuracy of my method, but also prove why and how past research was incorrect. All researchers who make a scientific finding prove their predecessors wrong. It is not arrogant for them to explain other researchers' errors, but rather an essential step in the scientific method. Dancing around the issue by inventing solutions for how past researchers were partially correct is likely to introduce new errors and to continue the spread of false beliefs. The criticism that I read too many books is one I expected, as Gabriel Harvey received the same criticism when he was denied progress in his academic career at Cambridge and lost his Rhetoric fellowship there by 1591, and afterwards had to rely on ghostwriting to make a living. If you actually read the said translations, you will find I include several annotations that show linguistic matches to other ghostwriters than the dominant hand, and I explain these and use them to adjust my understanding of the Workshop's writing process. But in most cases, the research I am doing for these annotations confirms matches to the dominant ghostwriter for each of these texts. In general, if you are going to dismiss the capacity of any "researching scholar", you have to actually cite specific errors you have found in the books you are dismissing. Just saying that I read too much, and you imagine I am biased by my beliefs does not prove anything other than that you are biased against me personally.

654SandraArdnas
Feb 19, 2022, 3:46 pm

You could spend a life one something and still not be convincing. It's called faulty reasoning and the fact that you think the amount of time spent on something equals its veracity is just one of many you've demonstrated.

I'd stop embarrassing myself by criticizing a self-proclaimed layperson's comment for not citing specific errors since you ignored the errors pointed out to you by people educated in related fields. Consummately ignored numerous errors pointed out to you. Blindly. Willfully.

655faktorovich
Feb 19, 2022, 8:51 pm

>654 SandraArdnas: No, I need to spend precisely around another 10 months on this project to finish translating the specific texts I have chosen to prove this case beyond all possible doubt. These texts are not random translations, but rather each prove a specific component of the larger argument regarding the six ghostwriters who created the British Renaissance. For example, today I was translating Act V of Harvey's "Brandon"-bylined "Virtuous Octavia"; this is the only text published under this obvious pseudonym without any proven biographical existence of a "Brandon"; this particular play explains the disagreement Harvey had with his student, Percy, regarding if virtuous characters who refrain from violence can be the heroes of a play, and it thus explains the hyper-anti-reaction against this portrayal that Percy had across most of his homicidal-suicidal "Shakespearean" tragedies; and for the introduction, I have also translated letters between Harvey and Byrd that prove that they were working together at Cambridge with Byrd helping meter and rhyme Harvey's long poetic works (leading to the tertiary trace of Byrd's signature in "Octavia"), etc. etc. The previous play I translated was Jonson's, and I similarly used the extensive introduction and internal annotations to prove the case regarding Jonson's linguistic-signature by searching for other types of evidence. After "Octavia", I am going to translate self-attributed rhetoric textbooks of Jonson and Harvey, which prove their pedagogic familiarity with the profession of writing. After I finish this set, I will have run out of strings of evidence that I have found necessary to explore when I designed this series to reach a definitive conclusion to the mystery of the attribution of the British Renaissance.

If you do not read any of the 28 or so books that will be in this BRRAM series, obviously you cannot be convinced by something you are ignorant of. There are some lawyers who win cases by just telling jokes, or playing golf with the judges. There are indeed no lawyers out there who have written 28 books to prove their cases with precedents etc., but that's because judges have a word-limit on briefs etc. Then again, some judges' opinions can stretch into books, when the complexities require specificity. It would be absurd if the page-count of the argument alone won the case, and I never said anything of this sort. I have never ignored an "error" pointed out to me; I always consider if it is indeed an "error" or if it is a false claim of erroneousness. Across this discussion, I did not come a single legitimate error anybody has raised with the BRRAM series. BRRAM's conclusions are correct, which you would realize if you actually read the series, instead of accusing me of being willfully blind without explaining this exaggerated insult; willful blindness is when somebody refuses to consider research, as you are doing with BRRAM.

656MarthaJeanne
Modificato: Feb 20, 2022, 1:55 am

Do you really think that posting these things is going to encourage anyone to read even one book of yours, never mind 28?

If what you have posted here is a sample of your writing, it doesn't really matter whether your ideas are right or not.

657faktorovich
Modificato: Feb 20, 2022, 12:00 pm

>656 MarthaJeanne: Why would you imagine the point of me posting here free information about who really wrote the British Renaissance is meant to solicit any purchases of the series. I have repeatedly offered to give free review copies of the first 14 books in the series to anybody who emails me at director@anaphoraliterary.com. I'm semi-retired. Sales aren't something I care about. I care about making this information accessible to the public. I have given away over 20,000 free pdf copies of the BRRAM series so far.

You are saying something nonsensical. You are suggesting you have glanced at how what I have written here so far looks without reading or caring about the "ideas" I have communicated, before posting derogatory remarks about these ideas that you have not actually read. How can any amount of writing posted anywhere not be a "sample of your writing"? That's a nonsensical question. The point is that if you are willfully ignorant, and do not want to read (when books are offered for free); you are not somebody who can be trusted to evaluate research.

658clamairy
Feb 20, 2022, 3:30 pm

>656 MarthaJeanne: Thank you for wording my thoughts so perfectly.

659faktorovich
Feb 20, 2022, 9:02 pm

>658 clamairy: If your thoughts have been "perfectly" expressed by somebody else; there would not be a need to repeat or puff those words, as perfection does not benefit from addition or inflation.

660clamairy
Modificato: Feb 20, 2022, 10:39 pm

Are you this funny in person, or only via keyboard?

661SandraArdnas
Feb 21, 2022, 2:12 am

>655 faktorovich: Why would anyone waste their time on scholarly work by someone whose reasoning is abominable and discussion abilities non-existent? Both are a prerequisite for valid work. You've shown none, just a lot of self-congratulatory nonsense. Thanks, but no thanks.

662MrAndrew
Modificato: Feb 21, 2022, 7:07 am

>657 faktorovich: >656 MarthaJeanne: said "read", not solicit or purchase.

Kudos for using "nonsensical" twice in one post, though.

663faktorovich
Feb 21, 2022, 12:13 pm

>660 clamairy: Based on quantity of laughing I have heard in the college classes I have taught, I believe I am indeed equally funny in person. Though it is a low bar, since this forum has judged me to be significantly less funny than the humorists who have been responding to me. My goal is to educate, and not to entertain, but sometimes a joke is more effective in delivering an educational message in a digestible form.

664faktorovich
Feb 21, 2022, 12:19 pm

>661 SandraArdnas: Have you asked yourself why you are wasting time commenting about how reading would be a waste of your time? "Discussion abilities"? A "discussion" is "the action or process of talking about something in order to reach a decision or to exchange ideas". Instead, you are talking about nothing, as you are not using any specifics. And in addition to failing to even read any of my ideas, you are also failing to communicate any of your own "ideas". The only element you are meeting is that you are reaching a "decision"; but reaching insulting conclusions that are false and malicious regarding somebody else's 28 books is called "libel" and not a "discussion".

665faktorovich
Feb 21, 2022, 12:22 pm

>662 MrAndrew: If my response necessitates the use of "nonsensical" twice, you really have to re-read what I am responding to more closely to check if it is indeed extremely nonsensical, as you might be misunderstanding how it is nonsensical if you are averse to reading.

666Crypto-Willobie
Feb 21, 2022, 2:26 pm

I assumed we were going for a thousand posts. What's the LT record for a single (non-continued) thread?

667anglemark
Feb 21, 2022, 3:35 pm

>666 Crypto-Willobie: I have never seen one this long before, at least. Will we get a badge?

668lilithcat
Feb 21, 2022, 3:40 pm

I just want to point out that Crypto-Willobie’s post bears the mark of the Beast.

Make of that what you will.

669clamairy
Modificato: Feb 21, 2022, 3:51 pm

>668 lilithcat: I was very excited about that and didn't want to reply and ruin it.

We have a thread in The Green Dragon that is 891 posts long. I hesitate to share the link because it should stay buried.

670faktorovich
Feb 21, 2022, 8:41 pm

>666 Crypto-Willobie: I am going to continue responding as long as somebody is asking questions or commenting on my research. A thousands or ten-thousand posts doesn't make much difference to me. 10,000 posts at 200 words each would only be 2 million words. BRRAM is currently at 1,644,435 words, and I have only just started its second half. So if we reach 2 million words here, it will be less than BRRAM will eventually add up to. I don't think there is a post or word number that wins a badge on LibraryThing; just the joy of writing.

671faktorovich
Feb 21, 2022, 8:51 pm

>668 lilithcat: The "666" "Mark of the beast" superstition comes from Verstegan and Harvey's translation of "King James Bible": "And that no man might buy or sell, save he who had the mark, or the name of the beast, or the number of his name./ Here is wisdom. Let him who has understanding count the number of the beast: for it is the number of a man; and his number is Six hundred threescore and six." I discuss the superstitions, demons and other monstrosities the Workshop made up in my translation of Percy's "Thirsty Arabia". The main published source that mentioned demon names in the Renaissance was “Johann Weyer’s” "Pseudomonarchia Daemonum" (1577). The first English translation of “Weyer” was “Reginald Scot’s” "The Discovery of Witchcraft" (1584). The Workshop used varied spellings of these names to make it sound as if they came from ancient sources, whereas these names never appeared before the Workshop made them up. Thus, if you read BRRAM closely enough, you will find the reference to "666" in this passage is yet another Workshop hoax that is still believed by millions of modern humans (alongside the "Shakespeare" pseudonym, and the books they published that promoted burning witches attributed to "James I").

672melannen
Feb 21, 2022, 8:59 pm

>652 Shookie: As someone who reads a lot of psychoceramics books for fun, I don't think you're doing it wrong if you review a book well because it was an enjoyable, interesting read and it introduced you to new ideas and concepts! I have read a lot of books that were 100% wrong about everything but were super fun to read and had a lot of interesting facts in them, even if the analysis was all silly. You can pull The Book of the Damned out of my cold dead hands.

But I think it is important - not just as a reviewer but also as a reader - to learn how to tell when something is not making a good argument, though, and to note that in a review when you can as a warning to others (even if it's a generally good review!)

673SandraArdnas
Feb 22, 2022, 8:00 am

>664 faktorovich: I haven't asked myself, I know. Because your self-satisfied rants deserve it and until such time that I and others stop opening this thread to read your babble, you'll just have to live with the feedback. Unless you actually respond to the issues people raised throughout the thread, you're just a hack. You can write 28 more books and have 3 more PhDs, you'd still be a hack if you only ever talk to yourself, which you do. Stop criticizing others and looking in the bloody mirror. Let me reiterate this one more you time: You haven't in any way whatsoever addressed any, not a single one, objections raised, the very specific ones you ask for otherwise. Hack.

674abbottthomas
Feb 22, 2022, 9:47 am

What is the point of agitation and anger? Isn't it time this thread was allowed to fade away into well-deserved obscurity?

675clamairy
Modificato: Feb 22, 2022, 10:11 am

>674 abbottthomas: Definitely. It's almost a textbook example of Last Word Syndrome. (I'm sure there's a better name for it, but Google isn't revealing it today.)

676Cynfelyn
Feb 22, 2022, 10:13 am

>674 abbottthomas: As various Avengers put it to the Hulk, "The sun's going down, it's getting real low".

677faktorovich
Feb 22, 2022, 12:20 pm

>672 melannen: Publications under the "Josiah Stinkney Carberry" pseudonym of the fictitious Brown University professor who specializes in the field of "cracked pots" or "psychoceramics" is a great example of a strategy the Workshop is likely to have started. By using absurdly named pseudonyms ("Shake-spear", "A Monday", "John Done"), they could claim these were satirical jokes if anybody investigated if there was any documentary proof for these bylines existence in the real world. The use of fictitious pseudonyms in academia could be used by a professor to collect multiple salaries or multiple fellowships (the latter might not have involved teaching, so no chance of discovery for this non-existent scholar). The more traditional fraud is for academics to pay for paper-degrees and to pay a ghostwriter to write their dissertation, and then to ghostwrite scholarly articles/ books that win them tenure. The use of fictitious professors might seem to be a harmless practical joke, but this type of academic fraud can be one of the reasons scientific progress has come to a near-halt in the world despite billions of literate people on the planet, who can all be researchers, if the gatekeepers were not corrupted by purchased-professorship schemes that occupy top research positions that are most likely to win funding, be published etc.

A fact is a thing known to be "true", so if any book has some "facts", it cannot be "100% wrong" or untrue. If you care about what has been actually going on in the world in the last 500 years, you really just have to read my BRRAM series. If I am still interested in writing more on this topic after nearly 2 million words, there are daily discoveries I am making that are of acute interest to all.

678faktorovich
Feb 22, 2022, 12:43 pm

>673 SandraArdnas: When you refer to a “hack” writer, you seem to be using it as an insult. I have explained earlier in this discussion that I have worked as a ghostwriter before. According to the Collins English Dictionary, a “hack writer”, is “a writer of undistinguished literary work produced to order”. I have indeed done some undistinguished literary ghostwriting for contracted sums of money; so I am indeed a “hack writer”. But I have only done a few such assignments, or a tiny percentage of my writerly output under my own byline. One of the reason I did these contracts is to research this ghostwriting process, so that I can apply these findings to my studies, such as BRRAM. As I explain in BRRAM, the British Renaissance Ghostwriting Workshop was started when a couple of “hack writers”, Verstegan and Harvey, began working as secret-secretaries for Elizabeth I, Vere, and other aristocrats and government officials. “John Florio’s” “Queen Anna’s New World of Words, Or, Dictionary of the Italian and English Tongues” (1611) includes a revealing definition for the Italian term, “secretario”, as, “a secretary, a secret-keeper”. The same page also defines “secreta”, as “a thin steel cap or close skull worn under a hat. Also the name of a place in Venice where all their secret records or ancient evidence is kept, as in Westminster Hall.” And “secreto” is defined as, “secret, hidden, privy, alone, unknown…” These definitions are significant because they explain the position of the “secretary” was designed to be the secret-ghostwriter performing most of the work, while remaining “unknown” or “alone” in their study, while their boss (monarch, prime minister etc.) received the fame, credit and public acknowledgement. Thus, it is very strange that you imagine “hack writer” is a pejorative term, when it is actually the job function that is responsible for most of the scientific, literary and general progress for the last 500 years. Harvey and Verstegan were perhaps better secret-secretaries and hacks than their modern equivalents, but they remain equally “undistinguished” or “unknown” under their own bylines.

679faktorovich
Feb 22, 2022, 12:47 pm

>675 clamairy: The Urban Dictionary defines "Last Word Syndrome" thus: "A compelling need to be the last person to speak during an argument or conversation; finishing an argument with a response of immaturity typically consisting of either repeating the last thing you said over and over until the other person stops talking, making whiny/baby noises, or childish name-calling." This might be what others in this thread are doing, but I am adding new things, and refraining from repeating anything I have said before. I don't know why you are struggling with repetition... Perhaps if you read more books, you will have more new things to say.

680clamairy
Modificato: Feb 22, 2022, 2:07 pm

>679 faktorovich: Did I point the finger at you specifically regarding the LWS?
(But you are just as guilty as the rest of us.)

681SandraArdnas
Feb 22, 2022, 2:51 pm

>678 faktorovich: No, I mean a hack scholar

682faktorovich
Feb 22, 2022, 9:14 pm

>681 SandraArdnas: So your term would be defined as: "a scholar of undistinguished literary work produced to order"? Nobody has contracted me to write BRRAM. I am doing it without any payment, as I instead invest money into it, and as I said have given 20,000+ copies of it away for free. If you take out "produced to order", all that's left is "undistinguished"; and this simply means that you are using your power as a reviewer to claim my work has not been recognized by others and this means it deserves being summarized with a derogatory term. All the lack of "distinction" means is that I have not paid for any Kirkus etc. sponsored reviews, or paid for award competition fees to win the pufferies that would put a bow of fame on BRRAM. I don't know why you care about distinction; I just care about researching and communicating the true attributions of the British Renaissance because it is a period that has historically taken up an enormous chunk of the school-curriculum.

683melannen
Feb 22, 2022, 9:18 pm

>677 faktorovich: You know, you're right, it *would* make a lot of sense if faktorovich was actually fifteen different people ghostwriting under a pseudonym.

684SandraArdnas
Feb 23, 2022, 2:51 pm

>682 faktorovich: I am really not sure whether you actually believe you're doing yourself any favors here. You've thoroughly disgraced yourself as a scholar and intellectual. Bye

685faktorovich
Feb 23, 2022, 3:17 pm

>683 melannen: Now you are saying I am so prolific, you believe I cannot be fewer than fifteen different people using the "Faktorovich" pseudonym? Are you attempting to give me a compliment? I guess writing 14+ books annually is an intense writerly output, but that's normal for any writer who writes as a full-time+ job. That's how the Workshop created the entirety of the published British Renaissance output between the six of them. My writing speed on this project is part of how I am proving the case they could have done it.

686faktorovich
Modificato: Feb 23, 2022, 3:29 pm

>684 SandraArdnas: I believe that by writing in this discussion I am doing the public "favors", which is a synonym for a free "service". I own my own publishing company; I cannot be "disgraced", in the sense of losing my position of power as a publisher, as I am not planning on firing myself. Neither have any of the points raised in this discussion actually been rational reasons for discounting BRRAM's findings. My computational-linguistic method has derived the correct re-attributions for the Renaissance, as I have sufficiently explained here. All you guys have succeeded doing is in "discrediting" it in the dictionary-sense of this word or by doing "harm" to my "good reputation", by insulting my findings and me personally, without providing any substantiating evidence to support these malicious accusations. Similarly, Judge John Hathorne convicted nineteen Salem "witches" to die by hanging by similarly "discrediting" them as being immoral and evil by hearsay accusations of the crazed mob. I think it is far more honorable and moral to be on these executed innocent women's side, and not on the side of the homicidal judge who believed in unfounded accusations, without research, to the point of mass-massacre.

687SandraArdnas
Feb 23, 2022, 4:14 pm

>686 faktorovich: If you want to salvage some self-respect, than respond to issues raised. That is all. If you were actually able to do it, you would have done it already. You can't.

688faktorovich
Feb 23, 2022, 8:37 pm

>687 SandraArdnas: I have responded in full to every single "issue" that has been raised. If there is an "issue" you believe I missed, please quote what this issue is, and I will address it (again). My self-respect is perfectly intact; I am as sure about my re-attribution conclusions as I was when I finalized the six ghostwriters' names a couple of years ago. At issue is not respect (from myself to myself, or from others towards me), but instead simply that there is absolutely no quantity of evidence that would be enough for you to state publicly that you agree with my re-attributions. My claim that six ghostwriters wrote the British Renaissance is simply too history-changing and too impactful to the overgrown industry of paper-mills, ghostwriting, and secret-secretaries for any "insider" to acknowledge its verity.

689abbottthomas
Feb 24, 2022, 4:24 am

>688 faktorovich: As I see it, your claim that ….six ghostwriters wrote the British Renaissance is simply too history-changing and too impactful to the overgrown industry of paper-mills, ghostwriting, and secret-secretaries for any "insider" to acknowledge its verity. has too much in common with the bizarre conspiracy theories that abound in our internet age. Your ideas are too far removed from what I have observed and learned about human behaviour in the course of a long life.

There are, apparently, many people who are prepared to believe that world leaders are giant, shape-shifting lizards. I don’t think I have seen a single contributor to this thread support your ideas.

690MrAndrew
Feb 24, 2022, 6:09 am

I support the giant, shape-shifting lizard hypothesis. It explains so much.

691HugoDarwin
Modificato: Feb 24, 2022, 6:14 am

Questo utente è stato eliminato perché considerato spam.

692melannen
Modificato: Feb 24, 2022, 12:06 pm

>690 MrAndrew: Please read up on the lizard hypothesis before you throw your hat in. :( I too believe in a race of shapeshifting alien lizards, but only the ones who live on an invisible island in Lake Michigan and and are only here to play jazz (in Lizard Music.) The broader theory goes places you may not want to support really fast and is actually taken seriously by a lot of people.

693paradoxosalpha
Feb 24, 2022, 11:58 am

Yeah, I either answer directly to a lizard or I am one, as far as I can tell. I find the explanatory power of the "theory" tenuous, no matter how superficially amusing.

694melannen
Modificato: Feb 24, 2022, 12:09 pm

>693 paradoxosalpha: Well, according to strict scientific cladistics, you are a reptile! I am also a reptile. We are also fish.

Sadly we are not shapeshifting extraterrestrials though. (Probably.) And we are definitely not lizards.

695faktorovich
Feb 24, 2022, 2:01 pm

>689 abbottthomas: If you think the idea that six ghostwriters created the British Renaissance is unbelievable, you haven't been paying attention to the news. You haven't even been paying attention to this thread as >672 melannen: melannen pointed to the "Josiah Stinkney Carberry" pseudonym of the fictitious Brown University professor. A fictitious professor is an extreme of the power pseudonyms and ghostwriters can have over academia and publishing. 1. 2012 Harvard cheating scandal: https://www.nytimes.com/2013/02/02/education/harvard-forced-dozens-to-leave-in-c... In one Harvard scandal over half of a class was caught with near-identical answers on a final exam, and TAs were involved in helping them cheat by handing out answers. 2. There would not be a need for any Harvard student to cheat if they were indeed the smartest students in the world, and had not first cheated on the SAT, or otherwise corrupted the admission process as the "2019 college admissions bribery scandal" proved. 3. "Gwyneth Paltrow’s Cookbook Scandal": "Julia Moskin published an article in the New York Times about her experience ghostwriting cookbooks for famous chefs", which was denied by Paltrow, who insisted she wrote her own cookbook... Who believes that? 4. "John F. Kennedy won the Pulitzer Prize for his 1957 book Profiles in Courage... In all likelihood, it was ghostwritten by Ted Sorensen, Kennedy’s speechwriter". There have been "questions" or authorship attribution discussions about this book since. My method can solve this question definitively. Somebody who just hired a ghostwriter should not be credited with the honor of a Pulitzer Prize to their byline. 5. "The New York Times has outlined recent court documents revealing that ghostwriters paid by drug giant Wyeth Pharmaceuticals played a major role in producing 26 scientific papers published in medical journals that backed the use of hormone replacement therapy in women. That supposed medical consensus benefited Wyeth directly, as sales of its HRT drugs Premarin and Prempro soared to nearly $2 billion by 2001... Wyeth now faces over 8,400 lawsuits* from women who claim that the company’s hormone replacement therapy drugs caused them to develop illnesses." 6. Here are some plagiarism scandals: https://examples.yourdictionary.com/what-are-famous-examples-of-plagiarism.html 7. New York's Governor was just forced to resign in part because he used his staff to ghostwrite a book for which he received millions. This case can be categorized as a secret-secretary scandal, since the name of the staff member who did the ghostwriting has not been publicized. 8. The paper-mill problem is so overgrown there have been several cases such as: "January 2021, Fisher retracted 68 papers from the journal, and editors at two other Royal Society of Chemistry (RSC) titles retracted one each over similar suspicions; 15 are still under investigation" (https://www.nature.com/articles/d41586-021-00733-5). So, please clarify which of these major-newspaper-covered stories you believe to be equivalent with "shape-shifting lizards" in believability. My conclusions are equally truthful, they simply look at this broad picture of these different types of writerly/ publishing fraud and find it in the uniquely difficult to access, due to the centuries between us Renaissance, period.

696melannen
Modificato: Feb 24, 2022, 2:22 pm

>695 faktorovich: I didn't deliberately point to Carberry actually, I just used a synonym for "the study of cracked pots" that's still current at Brown University, also home to Carberry (who is an in-joke, not a fraud, for the record! There is a difference), while talking to somebody else about philosophies of book reviews.

Sorry if you've been in situations where an injoke was part of an 'extreme of power' issue.

697faktorovich
Feb 24, 2022, 4:13 pm

>696 melannen: That is precisely the disagreement between all of you and myself. You think that you can simply put a positive spin on identity/ academic fraud, turning it into an "in-joke". On the other hand, I am considering the crime of using a fictitious professor name by the legal definition of "identity fraud". Thus, my point of view is the rational legal perspective; whereas, you are for coloring fraud with a unicorn of happy-thoughts or painting it as a harmless amusement of insiders like those in Skull and Bones at Yale. Jokes or funniness can be made about anything from the Holocaust to racism to sexism to, apparently, ghostwriting and academic fraud. But jokes are not science, unless you are studying the components of humor. Jokes do not trump scientific findings of who actually wrote the Renaissance. Believing that jokes alone can discredit a scientific conclusion, is like believing that joking or eliciting laughter about the Earth being flat is sufficient for this theory to be taught as the truth in textbooks.

698prosfilaes
Mar 12, 2022, 3:25 pm

>694 melannen: I've never heard that before. We are fish, yes, but not reptiles. I can't read that whole article, unfortunately, but Wikipedia* says what I understood, that among the amniotes, there are two cladistic groups, the mammals (including some long-extinct creatures often called mammal-like reptiles), and the reptiles (including birds).

* https://en.wikipedia.org/wiki/Amniote and https://en.wikipedia.org/wiki/Sauropsida

699prosfilaes
Mar 12, 2022, 3:55 pm

>671 faktorovich: The "666" "Mark of the beast" superstition comes from Verstegan and Harvey's translation of "King James Bible": "And that no man might buy or sell, save he who had the mark, or the name of the beast, or the number of his name./ Here is wisdom. Let him who has understanding count the number of the beast: for it is the number of a man; and his number is Six hundred threescore and six."
... Thus, if you read BRRAM closely enough, you will find the reference to "666" in this passage is yet another Workshop hoax that is still believed by millions of modern humans...

Revelation 13:17-18:

KJV: 17 And that no man might buy or sell, save he that had the mark, or the name of the beast, or the number of his name. 18 Here is wisdom. Let him that hath understanding count the number of the beast: for it is the number of a man; and his number is Six hundred threescore and six.

NIV: 17 so that they could not buy or sell unless they had the mark, which is the name of the beast or the number of its name. 18 This calls for wisdom. Let the person who has insight calculate the number of the beast, for it is the number of a man. That number is 666.

Sure, modern translators are just following blindly with the KJV.

Wycliffe: 16 And he schal make alle, smale and grete, and riche and pore, and fre men and bonde men, to haue a carecter in her riythoond, ethir in her forheedis; that no man may bie, 17 ethir sille, but thei han the caracter, ether the name of the beeste, ethir the noumbre of his name. 18 Here is wisdom; he that hath vndurstonding, acounte the noumbre of the beeste; for it is the noumbre of man, and his noumbre is sixe hundrid sixti and sixe.

Naturally, they went back and changed all copies of Wycliffe to match their interpretation.

Luther's translation: 16 Und machte allesamt, die Kleinen und Grolien, die Reichen und Armen, die Freien und Knechte, daß es ihnen ein Malzeichen gab an ihre rechte Hand Oder an ihre Stirn, 17 daß niemand kaufen Oder verkaufen kann, er habe denn das Malzeichen Oder den Namen des Tieres Oder die Zahl seines Namens. 18 Hier ist Weisheit. Wer Verstand hat, der uberlege die Zahl des Tieres; denn es ist eines Menschen Zahl, und seine Zahl ist sechshundert und sechsundsechzig.

Google translate: And made everyone, small and small, rich and poor, free and slave, that there was a mark on their right hand or on their foreheads, so that no one can buy or sell unless he has the mark or the name of beast Or the number of his name. Here is wisdom. Whoever has understanding should consider the number of the beast; for it is a man's number, and his number is six hundred and sixty-six.

And of course, the Germans obediently went along with changing all copies of Luther's translation.

Vulgate: 16 Et faciet omnes pusillos, et magnos, et divites, et pauperes, et liberos, et servos habere characterem in dextera manu sua, aut in frontibus suis. 17 et nequis possit emere, aut vendere, nisi qui habet characterem, aut nomen bestiæ, aut numerum nominis eius. 18 Hic sapientia est. Qui habet intellectum, computet numerum bestiæ. Numerus enim hominis est: et numerus eius sexcenti sexaginta sex.

Google translate: 16 And he shall make all the small, and the great, and the rich, and the poor, and the children, and the servants, to have a character on their right hand, or on their foreheads. 17 and no one can buy or sell unless he has the character, or the name of the beast, or the number of his name. 18 This is wisdom. He that has understanding, let him count the number of the beast. For it is the number of a man, and his number is six hundred sixty-six.

Continuing to make claims that are so distant from reality doesn't make much of an argument for your base claim.

700Crypto-Willobie
Mar 12, 2022, 5:58 pm

700!!

701ScarletBea
Mar 13, 2022, 9:01 am

Is this still going on? oh my...

702paradoxosalpha
Modificato: Mar 13, 2022, 1:48 pm

>699 prosfilaes:

There is a variant preserved in ancient MS that gives 616. But 666 was in the medieval Vulgate as you note, and has been taken up independently by the overwhelming majority of modern translations.

703faktorovich
Mar 13, 2022, 3:56 pm

>699 prosfilaes: I did not explore the specific "666" question prior to this group's mention of it in a joke, so I gave my brisk impressions on the subject. For BRRAM, I am now working on translating Verstegan's self-attributed "A Restitution of Decayed Intelligence" (1605), and in the annotations/introduction to it I will explain Verstegan's role for setting the history of religion in England in this text, as well as in his ghostwriting of the translation of King James Bible. In my response regarding "666" I explained that I have previously researched the names of demons that the Workshop made up in their different books on demonology and witchcraft; these names were later repeated as if they were theological facts. Instinctively, it felt logical that they also erroneously translated earlier versions of the bible to give "666" or "616" a more significant demonic slant. But on further pondering, turning letters into numbers, or numbers into symbols is an essential part of the Hebrew Old Testament and its appended books such as Kabbalah; thus, it is indeed more likely that the "666" and other numerical references were taken from these earlier versions, versus being made up by the Renaissance Workshop. However, the "666" reference is in the New Testament in "Apocalypse of John", so these numeric references are indeed curious. On searching for "six" in the Wycliffite New Testament in Middle English, I found a few other mentions of six in the "John" Chapter: 1. In line 6 of Chapter 2: "And there were set six stonun cans, after the cleansing of the Jews, holding each between either three metretis." (The repetition of 6 in the number of the line and in the number of cans and also as a double of "three" builds an anti-Semitic reference in this instance, which is later repeated in the "666" section.) 2. Later in Chapter 2, "Therefore, the Jews said to him, 'In forty and six years this temple was built, and shall thou in three days raise it?" (The argument the Jews are raising against Jesus' theology is minimized if scholars only focus on the repetition occurrence of another "six"). 3. In Chapter 19: "14 And it was past eve, as it were the sixth hour. And he says to the Jews, Look, you are king." In the "Apocalypse" chapter there are even more references to the number "six": in relation to going "mad"; blood of the Apocalypse going into a lake: "a thousand and six hundred"; "the sixth angel shed out his viol in that like greet flood Eufratis"; "And the four beasts had every of them six wings"; and amidst these reference is the mention "number of the beast; for it is the number of man, and his number is six hundred sixty and six". Readers of "Apocalypse" would have previously read the "John" chapter and would have been likely to associated Jews with the number "six", and thus they would be convinced that Jews represented the "Beast" or the "Devil" hinted at. The Jews were banned from England at around the time when then Wycliffite bible was published. Christians demonized the Jews instead of recognizing they were reusing the Jews' Old Testament. The Christian authors probably were not familiar with Hebrew numerology, where specific letters of the alphabet are equated with numbers with specific symbolic meanings, and none of these are the "number of the beast". You have convinced me that the Workshop did not make up this "666" reference, but if you want to seriously study theological numerology, you should look up Jewish scholarship on this subject instead of sliding into "madness" of these types of "666" references. Also, at least some of the texts I have linguistically tested that have no date on them but are currently judged by critics to be from as early as 1550 matched the Workshop's signatures, and proved to have been created much later; the Workshop backdated and forged some texts, and so some texts believed to have been published pre-Workshop in Britain really should be carbon-dated/scientifically-dated to check their authenticity. Let me know if you have a follow-up question.

704Keeline
Mar 13, 2022, 6:04 pm

>703 faktorovich: "carbon-dated"? Are you aware that radiocarbon dating is only useful for items that are generally older than 500 years? The uncertainty rises significantly with more recent centuries.

Most of the texts discussed in this thread are considerably younger than this.

James

705prosfilaes
Mar 13, 2022, 7:40 pm

>703 faktorovich: I gave my brisk impressions on the subject.

So you stated as fact something that you came up with that you knew disagreed with modern scholarship despite being woefully unqualified to do so. Every letter of the Bible has been fought over; had so gross and blatant a change been made in the KJV, it would be known.

they also erroneously translated earlier versions of the bible

You claim to know who translated the KJV, but you say "earlier versions of the bible" instead of the Greek New Testament, or specifically the Textus Receptus.

I have previously researched the names of demons that the Workshop made up in their different books on demonology and witchcraft;

You can not tell me how the KJV relates to its source materials, but you expect me to believe you've studied all the French, German and Italian manuscripts, both in Latin and in vulgar tongues, that would have to be studied to say with authority which demons came from older works and which ones were invented? Wouldn't a first step in that be looking at how Wycliffe translated names to make sure no demon names were merely unfamiliar versions of standard names?

if you want to seriously study theological numerology,

I don't; you just made an extraordinary claim that was obviously wrong.

Let me know if you have a follow-up question.

You seem unaware that how unreliable we consider you as a source. I don't trust demonology books not to be pulling stuff out of thin air, but I have to work to merely disregard your statement, and not take it as evidence the demons in those books were all from previous sources. I don't trust you to have looked at The Dictionary of Demons or other source, which would have pointed you at Johannes Trithemius's works, among others, and you having dug up those type of obscure works.

706faktorovich
Mar 13, 2022, 8:44 pm

>704 Keeline: On a brief search, I found this scientific study of carbon-dating that was indeed applied to a book created in the time-range I am referring to: https://news.arizona.edu/story/ua-experts-determine-age-of-book-nobody-can-read#....

707paradoxosalpha
Modificato: Mar 13, 2022, 8:56 pm

>703 faktorovich: But on further pondering, turning letters into numbers, or numbers into symbols is an essential part of the Hebrew Old Testament and its appended books such as Kabbalah;

There is no turning one into the other. Hebrew like Greek and many other ancient languages is isopsephic, which is to say that the same set of symbols serves for both letters and numerals. The kabbalistic practice of gematria (divining meaning for names, words, and phrases on the basis of the summed values of their letters) isn't so exceptional and doesn't need to be ghettoized.

> Instinctively, it felt logical
The mutual fungibility of instinct, feeling, and logic seems to be central to your method.

708faktorovich
Mar 13, 2022, 8:58 pm

>705 prosfilaes: I am addressing questions by researching the subjects, whereas you keep searching for some reason to insult me personally. In my "brisk impression" on "666", I did not claim that what I was stating was fact, but instead clearly indicated it was a guess, and I did not specify that "666" was definitely the invention or addition by the Workshop, but rather that it sounded like something they would do because of their invention of demons names etc. (as I explained). There are many blatant changes in the King James version, which I explain in Volumes 1-2, but don't see the point of doing here, since you aren't asking me, but rather just trying to find a reason to reject everything I'm writing. And I said it "felt like" they might have forged earlier or pre-James versions of the Bible, but I do not think this is in fact the case, as the handwriting/ artistry is different in these earlier versions - thus, carbon-dating is needed. My original statement was: "The main published source that mentioned demon names in the Renaissance was “Johann Weyer’s” "Pseudomonarchia Daemonum" (1577). The first English translation of “Weyer” was “Reginald Scot’s” "The Discovery of Witchcraft" (1584). The Workshop used varied spellings of these names to make it sound as if they came from ancient sources, whereas these names never appeared before the Workshop made them up." I explain this in more detail in BRRAM. I am not saying the demon names in the King James Bible were made up by the Workshop, but rather in these separate theological texts that added new demonology, which was later repeated as part of Christian theology. "Dictionary" was published only a few years ago. Trithemius' "Steganographia" was only published in Frankfurt in 1606, or decades after the 1577 and 1584 books I refer to above as the origins of these "demons'" names. Let me know how I can clarify my points better, if you misunderstand them, and want to understand them.

709faktorovich
Mar 13, 2022, 9:16 pm

>707 paradoxosalpha: Yes, each of the letters in Hebrew represents a number, but it would be awkward if one had memorized the numbers they stand for, so an average reader would have to check the corresponding number. I did not put this practice of interpreting letters as numbers etc. in a "ghetto".

There is nothing cryptic in my response. If you want to understand how absurd "666" is, read a book like: "The encyclopedia of Jewish symbols" By Ellen Frankel, Betsy Platkin Teutsch (1992), which includes an entry for 613, and explains many simple numbers like "Three" and "Ten". And for the explanation regarding how "666" was one among many forms of Medieval and Renaissance demonizing of Jews read: "The Origin of Satan" By Elaine H. Pagels (1996). I don't want to repeat the points they explain in detail. I really have to spend my time on researching new findings in my current work on translating Verstegan's "Restitution", as I mentioned.

There is nothing instinctual about my purely quantitative computational-linguistic method. But responding to nonsensical questions about "666" and demons requires either some instinctual humor, or writing a book to explain Kabbalistic numerology to an audience that is pretty hostile to believing anything I say, as you guys have said.

710Keeline
Mar 14, 2022, 9:44 am

>706 faktorovich: This is not a “scientific study.” It is a popular-language account by a university journalist for their publication. I did not see information on whether the analysis was submitted to and reviewed by a peer-review scientific journal.

The work in question is hardly a good baseline since it is disputed.

The 1400s estimate by the study puts it in the 500+ year age where radiocarcon dating is possible but less certain. This account does not mention other “books” which have been dated with the same techniques.

Of course all that was really studied was the age of a sample of the parchment (animal skin). Parchment was reused in some cases. I don’t know what evidence remains when this is done or whether this study looked for it.

I am not an expert on radiocarbon dating. As indicated, it requires the skills of many. But when I read that it is less reliable for young samples, I have to trust that is a reliable statement.

James

711faktorovich
Mar 14, 2022, 10:36 am

>710 Keeline: Dear James: The Old and New Testament Bible versions we have been discussing are currently claimed to be from as early as 650–587 BC through the 10th/11th AD and into 15-16th centuries. Thus, even if carbon-dating does not work on newer texts or specifically those published in the last 500 years (a claim I did not see anywhere on a brief review of the evidence); most of these manuscripts can be tested because they are claimed to have been published over 500 years ago. The article is about the research the University of Arizona's NSF Arizona Accelerator Mass Spectrometry, or AMS, Laboratory is doing. It does not get any more "scientific" than science done in a science lab of a tier-1 research university. The article specified they analyzed the "Voynich manuscript". The rest of the manuscript remains when a tiny piece is used, so the rest can be re-tested if there is any doubt. If you are going to argue the "500 years" objection, you have to cite some source for why you believe this is the case. And what relevance there is when 2022 - 500 = 1522 and some of the texts I note as misdated are currently dated at around or just before 1522. Since the element measured degrades steadily in materials and does not start degrading only after 500 years, and small variations have to be measured to distinguish items from 13th vs 11th century as well, I cannot imagine there is any basis for a 500-year-limit. This is not my area of research, so I have not yet dug further for the precise answer. But it's odd that you seem to have made up an arbitrary (500) number just to counter my simple point that all potentially forged Bible manuscripts should be tested with an unbiased method for their true dates.

712SandraArdnas
Mar 14, 2022, 10:55 am

Apparently, this thread will never die. Long live ignore button

713andyl
Mar 14, 2022, 1:34 pm

>711 faktorovich:

Traditional carbon dating doesn't work for fairly recent dates. Approximately 500 years BP is the accepted horizon, the calibration curves do not really support anything more recent. The problem is that the error-bars grow too big as C14 has a half-life of approx. 5730 years. If you imagine trying to date paper which is 50 years old you will appreciate that very little C14 will have decayed (not enough to have statistical confidence in a result), by the time you get to 500 years BP you can start to have some confidence in the result.

Also retesting is not likely to get the same figure (the British Museum did a trial where they tested the same object over a number of weeks) and got results back within +/- 1 standard deviation. For the Voynich Manuscript you might find dates anyway from the late 14th to early 15th century.

714faktorovich
Mar 14, 2022, 9:00 pm

>713 andyl: The problem with the various earliest Bible manuscripts is that they could have been forged centuries later than when they are currently believed to have been written. There is no concrete evidence to support the speculations of 500 BC vs. 11th century AD etc. So, being a century off would not be that significant. It's not a matter of if these manuscripts were written in 1550 or 1520, but rather if they were all forged in the 1400-1600, without any earlier versions made in Europe. As you explain it the testing horizon is not 500 years, but rather the lack of certainty under-100-years-range of any date within the past 1000 years. Also testing the oldest manuscripts from other nations would help to establish what the world's actual oldest surviving manuscripts are, excluding those that were forged later to establish a grand early history for places that were barely or not at all literate in those early centuries.

715faktorovich
Mar 16, 2022, 10:55 pm

As I continue working now on the 3rd volume in the second half of the BRRAM series, I have found some still more convincing handwriting analysis evidence for 5 (so far) of the 6 ghostwriters. Here is a link to the file with these handwriting samples in GitHub: https://github.com/faktorovich/Attribution/blob/6dc1430a605b7544d2fb15e8d16a7291.... Sylvester's handwriting is obviously identical to both "Mary Sidney" and "Philip Sidney". I have also found the letters from Jonson and "John Donne" to be uniquely identical in handwriting. And these files give dozens of examples where ghostwriters have the same handwriting as the pseudonyms/ "writers" they are ghostwriting for. I have not found a single Renaissance scholar that has published about these similarities. Most of the "writers" currently credited with texts from the British Renaissance do not have a single signature of piece of handwritten text to prove they wrote a word during their lifetimes. In contrast, these ghostwriters wrote numerous letters and manuscripts with their own names on them; a very rare thing by itself during this period. If nobody in this discussion is willing to consider that my attributions are accurate, it is clear that nobody has followed the links I have provided to files like this one on GitHub that provide overwhelming evidence even without the need to look inside Volumes 1-2 for the writeup.

716prosfilaes
Mar 17, 2022, 10:45 pm

>715 faktorovich: You uploaded a DOCX file to Github? Why? Why? Why? At least print it out and turn it into a PDF.

If nobody in this discussion is willing to consider that my attributions are accurate, it is clear that nobody has followed the links I have provided to files like this one on GitHub that provide overwhelming evidence even without the need to look inside Volumes 1-2 for the writeup.

Sure, if nobody agrees with you, then they must not have looked at your evidence.

I also appreciate that you're handing old handwriting samples to amateurs and expect us to find that convincing proof of anything, as if we were experts in handwriting analysis. On the other hand, when it comes to the earliest Biblical texts, like the Ketef Hinnom scrolls, which is where the 650–587 BC you mention apparently comes from, "There is no concrete evidence to support the speculations of 500 BC vs. 11th century AD etc.", despite that date coming in part from experts in ancient Hebrew handwriting dating the writing used. Handwriting styles are conclusive evidence when you want them, to be, but not even "concrete evidence" when you don't want it to be.

>708 faktorovich: There are many blatant changes in the King James version,

See, now you're coming into a field I've studied a bit in. I've read books by people who believe the KJV is the one true bible, and compared it to more modern versions. Every difference between the KJV and more modern versions was listed as a reason to reject them. Guess what? Except for the Johannine Comma and the long version of Mark, both issues with the texts used to translate the KJV Bible, it all comes across as a bunch of shouting; the KJV says basically what the modern versions of the Bible say, just in Elizabethan English. If there were blatant changes, they would be used in the argument; KJV-onliers would argue that they are proof the new versions are wrong, and modern translators would use them as proof the KJV was unacceptable. Catholics would have wielded it as a weapon against the Anglican Church and English Protestants as a whole, comparing it to the Vulgate and later the Douray-Rheims.

So what was a blatant change in the King James version?

The Workshop used varied spellings of these names to make it sound as if they came from ancient sources, whereas these names never appeared before the Workshop made them up.

Which is sheer opinion, stated as fact. Motives are hard, especially when you're talking about stuff done in secret.

Trithemius' "Steganographia" was only published in Frankfurt in 1606, or decades after the 1577 and 1584 books I refer to above as the origins of these "demons'" names.

Either the names are found in "Steganographia", and they're not made up, or they aren't found there, in which case they may have come from some other source. It's hard to prove that a work was the original source for a name or concept; you may just have not looked in the right source.

"Dictionary" was published only a few years ago.

Right, that's called a reference work, and can tell you when and where the first appearance of a demon might be from. You can verify the statement, but a modern reference work can tell you where to look.

717faktorovich
Mar 18, 2022, 9:46 pm

>716 prosfilaes: You don't have to "print out" a document to turn it into a PDF. There is an automatic function to do this. By turning a document into a PDF, one prevents individual images of letters etc. to be saved separately by future researchers, and otherwise inhibits the full accessibility of a file for searching etc. The docx format has been standard for a couple of decades. What is troubling you about this simple and correct upload?

It is absurd to imagine that any handwriting expert can identify the age of a document to the year or even to a given century. The best imaginable handwriting expert can only establish when handwriting styles of a group of samples match, so that a single hand appears to be behind them. When all of these samples were written is not apparent from the handwriting alone. Even if one of the texts in this cluster of matching handwritings had a specific "587 BC" date on it (a bit funny since there was no "BC" before Christ, so it would be in the Greco-Roman calendar), it would be impossible to prove if the entire cluster was the work of a ghostwriter/forger working centuries or millennia later, who intentionally added this date to authenticate the entire cluster as being from this ancient, and thus uniquely profitable for forgery-sellers, period.

What about just opening the file with the handwriting samples I provided, and saying maybe a single word that proves you opened it and considered this evidence. It is self-explanatory, so if you had opened it, you would not be suggesting only an expert can see the similarities within the handwriting clusters I am providing.

What? You are comparing the King James Bible with modern versions to check if KJB made significant changes from the pre-KJB versions? I am saying that the Workshop heavily edited the Bible from Middle English, Latin, Hebrew etc. earlier versions to turn it into Early Modern English version that is much closer to the modern standard Bible version than any of the predecessors. Who are these Catholics and Protestants you imagine would have argued against the KJB translation? You understand that my computational-linguistic (and handwriting, etc.) findings are saying that Verstegan ghostwrote pretty much all of the English sermons, and most of the theological Protestant and Catholic pamphlets from around a century of British history, without any rival opinions being allowed into the monopolized publishing world in English (and with a smaller dominance in other languages like Latin)? So Verstegan would have had to sue himself for anybody in England to have claimed KJB was too heavy-handed a translation to be the official version. Seriously... don't you want to at least open the files I have posted on GitHub to kind of know what I have been talking about? I'm working on translating Verstegan's "Restitution" now, and in this translation I am explaining all of the ways Verstegan made up myth about Britons' "Anglo-Saxon" origins, and various surrounding topics. I am not translating KJB because it has been translated before, so it is relatively accessible, whereas "Restitution" has never been translated before, despite being the basis for the "Anglo-Saxon" concept. And you are asking me to run a full analysis of KJB to give you a list of all of the changes in it, or the changes I have noted on so far in BRRAM?

When I refer to made-up ancient sources, I am not using a generalization, but rather specific instances I am explaining in the "Restitution" translation. I am just not able to quote thousands of words from my draft in this discussion, and that's why I seem to be using general references. So, email me, and ask for a review of the first 14 volumes of BRRAM, and then read all the evidence I have published so far for yourself. I don't know what pieces are of interest to you, and taking references to KJB out of context would only confuse you further.

Yes, it is indeed possible that Verstegan and other members of the Workshop just made up the names of demons, Britain's "Anglo-Saxon" origins (via "Queen Angela" as Verstegan argued under the "Puttenham"-byline), etc., etc. Again, explaining all this in a post, even a very long post is clearly not helping you understand what I am referring to. I have found numerous mistakes in modern dictionaries, such as missing by decades the first appearance of a word etc. "Restitution" is a dictionary; thus, I am currently working on explaining how never before used words and histories were added to some of England's first dictionaries. It is amazing that you can carry on finding faults with my research without reading any of it. It must be a confusing discussion for you.

718Keeline
Mar 19, 2022, 4:02 am

On many programs and operating systems, the creation of a PDF is handled in the print dialog box. It sends the file to a driver that looks like a printer to the computer. The Mac is much better at this since 2000 but Windows has had it for about a decade without resorting to installing a third-party PDF printer driver.

For a final work, PDF is a preferred distribution method. Locking in the formatting, fonts, and content is useful, especially when one is only trying to read. PDFs from applications like Word are searchable. Even PDFs which have images of text pages can be made searchable with OCR techniques.

For either Word docx or PDF , and images embedded are reduced in size and quality to match the target output resolution. Usually this is much lower than the original as a separate image. So one of your stated goals is not achieved.

The main reason to distribute a docx is to allow review comments with Track Changes. Though good PDF programs allow for highlighting and comments.

Microsoft doc and docx have been around for a while and some other programs can read and write to them. But they are still proprietary formats, not an open standard like PDF which has been around for at least 25 years and probably longer. Certainly since 2000 there have been many non-Adobe programs which can read and write them.

It is not a safe assumption that all potential readers will have Word or an alternate program that can use the file well. Open Office is a good alternative but why make things hard? PDF was designed for the purpose and and computer or smart device since 1990 can probably read it with built in tools.

James

719faktorovich
Mar 19, 2022, 12:19 pm

>718 Keeline: When I had an error with my Microsoft Word a few years ago, I installed LibreOffice Writer - this is a free program that allows for reading, editing, writing etc. .doc/.docx and various other file types. Thus, there is absolutely nobody on the planet who cannot open and manipulate a .docx file for free. There are a few other free Word-equivalent programs out there as well. You cannot save images out of a pdf, so it is not suitable for a file that includes dozens of separate images that the reader might need to re-use in their own blogs/ articles. The pdf transformation is made deliberately to prevent the borrowing of images etc. or to make it more difficult to plagiarize the precise formatting of a file. The size of the file is irrelevant if a user would just open the file to see what it says before deleting it from their computer. Shrinking the file deliberately during the pdf-creation process is likely to decrease the resolution of the images, and the images need to be at their max for viewers to see the precise traits of a handwriting style. The question of .docx vs. pdf is entirely irrelevant to this discussion, and you guys must really be trying to avoid the overwhelming evidence inside this particular handwriting .docx to spend this much effort on discussing this trivial point instead.

720prosfilaes
Mar 20, 2022, 5:02 pm

>717 faktorovich: By turning a document into a PDF, one prevents individual images of letters etc. to be saved separately by future researchers, and otherwise inhibits the full accessibility of a file for searching etc.

False. I could pull out the images from a PDF much easier, and has already been pointed out PDFs are easy to be search. And Github is built around a tool that shows changes in text based documents, so the optimal thing is write the document in HTML, which would be maximally easy for everyone to use. Instead you saved it in an opaque binary format.

>719 faktorovich: You cannot save images out of a pdf,

You can just run pdfimages on the file. If you're concerned about the image quality, you could upload the image files themselves.

It is absurd to imagine that any handwriting expert can identify the age of a document to the year or even to a given century.

Proof by assertion. If you hand me an English language book, I can generally open it up to a random page and date it to within a century, and could probably do better with practice, just looking at the fonts and layout. Long-s, for one, disappears pretty sharply at the end of the 18th century.

it would be impossible to prove if the entire cluster was the work of a ghostwriter/forger working centuries or millennia later,

There's no evidence that forgers working centuries or millennia later would have known the scripts used centuries before. In the premodern era, such knowledge was not widespread, and such knowledge wouldn't be needed to convince an contemporaneous scholar.

What about just opening the file with the handwriting samples I provided, and saying maybe a single word that proves you opened it and considered this evidence.

I looked at it. It's a bunch of scribbles that I can't read. I certainly can't say how tight the penmanship training was at the time, and what's likely to vary between different samples of one person's handwriting and what is reliable evidence. The notes in the document are less than helpful; for the most part I can't even tell sources or what's being compared.

the modern standard Bible version

There are any number of modern standard Bible versions, the New International Version (NIV), the New Revised Standard Version (NRSV), World English Bible (WEB), English Standard Version (ESV), Revised New Jerusalem Bible (RNJB), New Living Translation (NLT), etc. One 2014 survey gave 55% of the respondents using the KJV, 19% using the NIV, 7% using the NRSV, 6% using the New American Bible and 5% using The Living Bible. There is no one modern standard Bible version.

The implication that all those works would just carry forward the changes of the KJV is insulting and absurd; hundreds of translators of every Christian religious tradition on the Earth, many priding themselves on working with the latest critical editions of the original Greek, Hebrew and Aramaic, never notice that the KJV translation doesn't match the original.

Who are these Catholics and Protestants you imagine would have argued against the KJB translation?

How about https://babel.hathitrust.org/cgi/pt?id=uc1.c2752852 , for a concrete example? What about the Catholics on the continent, who would have kept track of what the Anglican Church was doing, and would have jumped at blatant errors in the KJV? What about German Lutherans and Mennonites, immigrating to the US and encountering this new English Bible that seemed strangely mistranslated compared to the German Bibles they were familiar with?

And you are asking me to run a full analysis of KJB to give you a list of all of the changes in it, or the changes I have noted on so far in BRRAM?

I'm asking you to show me one change that the KJV made vis-a-vis the Textus Receptus that is clearly a willful change.

"Restitution" has never been translated before, despite being the basis for the "Anglo-Saxon" concept.

In your opinion. The rest of the world might point to the Ecclesiastical History of the English People, but you know better.

I have found numerous mistakes in modern dictionaries, such as missing by decades the first appearance of a word etc.

And? An unabridged English dictionary takes forever and a huge amount of money. Unlike the name might imply, the OED2 didn't touch most parts of the original OED1, meaning unless you're looking online at a part the OED3 revision has reached, or a relatively new word, you're looking at text that was last updated between 1888-1928. Before computers, before even widespread library-use of microfilm and microform. You have tools the people who wrote those dictionaries couldn't even dream of.

But the relevant question is the reverse. It is impossible to tell, in most cases, whether you have the first use of a word. You can't tell whether a work made up a demon, or whether it was copied from a Slovene manuscript tradition which managed to make its way to England in one manuscript, and it along with all its kin are now lost or at least undigitized, or perhaps even online at some Slovene museum you've never heard of.

721faktorovich
Mar 20, 2022, 9:03 pm

>720 prosfilaes: Yes, there are indeed some features in printed books' font type that changed between the centuries or even between decades, including the "long-s". I just spent most of the day writing an introductory section that explains the variants in handwritten and printed Old English/"Saxon" versus Early Modern English blackletter, roman etc. fonts. The reason all texts claimed to have been handwritten or printed before 1550 have to be tested is to check for forgery. In a forgery, the "long-s", Old English characters, or any other feature that is believed to be representative of a given age could have been forged to imitate it in a single document or in an entire cluster of documents. Dating the paper avoids the possibility of a massive forgery that has resulted in the misunderstanding of a giant chunk of human history.

"There's no evidence that forgers working centuries or millennia later would have known the scripts used centuries before." This is exactly what I was researching today. Proof that Verstegan was closely familiar with the Old English script is in "The Gospels of the Four Evangelists" (London: John Daye, 1571). This book has been digitized; take a look at the second page at the "Saxon Characters or Letters" diagram that explains the equivalent Saxon and English characters.

You are saying the handwriting samples are "a bunch of scribbles"? These are the best-known handwritten documents from the Renaissance, from "Shakespeare" to "Elizabeth I" to "Sir Raleigh", etc. etc. If you think all of them wrote in unreadable scribbles... Well, that's a radical opinion. Each of the writing samples is accompanied by very specific bibliographic entries that identify their sources. If you don't understand such citations, there is no way to simplify these for you.

The KJB was the first complete translation into English of both the Old and New Testaments; there preceding versions were of fragments in different languages (Old English, Latin, Hebrew, etc.) As the English language became more modern and less "Early Modern English", the number of translators fluid in Old English shrunk to zero or close to it. How long do you think it would take you to learn Old English to translate even a single psalm from it into Modern English without referring to previous translations? It is much easier for a translator to check KJB and imitate it, than to go back to decipher dead languages. As the first full translation, it is entirely different in all the words from the Latin or German or Hebrew versions that preceded it. The spelling of every word would have also been changed between the Old English and the Early Modern English versions, and the structure of each sentence would have been reworded. I doubt there is a single sentence that you could quote from any earlier version and KJB that is identical.

Verstegan's "Restitution" used "Bede's" "Ecclesiastical History" as one of its historical sources, but "Bede" does not in any way define the "Anglo-Saxon" concept.

Yes, there are modern tools that simplify the process of checking for errors in dictionaries and in translations, etc. That's why I decided this is the perfect time to create a Modernizing series: BRRAM.

You are imagining the fictional destruction of an ancient manuscript? If there was an ancient texts that named dozens of demons etc. for the first time in Catholicism, and you were a historian in let's say 1570 when you transcribed these names into your textbook; can you imagine under what circumstances you would have destroyed or allowed to be lost this precious and expensive ancient document, while preserving your own handwritten scribbles, copies of your printed book, etc. etc.? If you step away from fantasy, and into reality; then, you have to accept that if no ancient document exists that can be proven as an authentic source; then, it is far more likely that a "historian" who first named such demons was just writing fiction.

722Keeline
Mar 21, 2022, 1:32 am

When "font" is used to apply to letterpress printing, it is one size and style of a particular typeface.

Computer usage, particularly since the Mac of 1984, uses "font" in an expanded definition that is not as specific. However, if you are going to refer to old books, I would suggest using the proper term for clear communication.

Here is one of many pages with the definition I outlined above:

https://snowball.digital/blog/what-is-the-difference-between-a-font-and-a-typefa...

James

723anglemark
Modificato: Mar 21, 2022, 5:41 am

>719 faktorovich: " You cannot save images out of a pdf"

Do you know how to right-click?

(ETA: This is Johan commenting)

724faktorovich
Mar 21, 2022, 8:42 am

>722 Keeline: The link you have provided and what you are saying is entirely irrelevant to what I explained. The "blackletter" font is a font: https://en.wikipedia.org/wiki/Blackletter

725faktorovich
Mar 21, 2022, 8:49 am

>723 anglemark: Most standard pdf files, such as "Gospels" (1571) do not have the option to "Copy Image" or "Save Image". You need a special program beyond Adobe Acrobat Pro or special steps to save images out of a pdf. For example, you can take a screenshot of the image, or save the page separately, open it in Photoshop and then cut it out and save it as a jpeg. However, all of these options lead to a decrease of resolution, pixilation, or other problems that decrease the quality of the image. In contrast, the image is the original unshrunk image when it is saved in and taken out of a .doc/x document. Yet again, you guys are digressing into absolutely irrelevant subjects because you have no way to counter my arguments about the thing this discussion is about or "Who Really Wrote the Works of the British Renaissance?"

726anglemark
Mar 21, 2022, 9:18 am

>725 faktorovich: "Yet again, you guys are digressing into absolutely irrelevant subjects"

No, you don't need Acrobat Pro to save images from a PDF. Ordinary Acrobat Reader works just fine. I just downloaded the Gospels (1571) PDF from Open Library and copied some images in Acrobat Reader on my Mac. I think the digressions are because whatever subject you mention, you don't seem to know very much about it. It's not attempts to digress, it's "Look, not even in this are you correct".

727prosfilaes
Mar 21, 2022, 7:38 pm

>721 faktorovich: "There's no evidence that forgers working centuries or millennia later would have known the scripts used centuries before." This is exactly what I was researching today. Proof that Verstegan was closely familiar with the Old English script is in...

You were researching today, but nonetheless, we should trust you over people who have researching the subject all their lives. We aren't talking about whether someone in the British Renaissance could read the Old English script; we're asking if someone in the 11th century could read and convincingly write Paleo-Hebrew.

As the English language became more modern and less "Early Modern English", the number of translators fluid in Old English shrunk to zero or close to it. How long do you think it would take you to learn Old English to translate even a single psalm from it into Modern English without referring to previous translations?

What on Earth are you talking about? The original texts of the Bible, as close as we have them, are in Ancient Greek, Hebrew and Aramaic. Old English has nothing to do with it.

It is much easier for a translator to check KJB and imitate it, than to go back to decipher dead languages.

So one person could translate the Bible in Renaissance England, but if a couple dozen say they've done it in the 20th century, they must be lying.

Do you understand what translation is? That it's something done regularly, and often because the translator thinks there's no satisfactory version preexisting? That you're accusing hundreds of translators of being too lazy to do their jobs? A job you actually trivialize by saying the work of the KJV is done by one person?

And again you ignore the point that so many people would notice any major mistranslation in the KJV, Catholics fluent in Latin and the Vulgate, Jews fluent in Hebrew eager to point out Christian misinterpretation of the Hebrew Bible, immigrants to Anglophone nations who knew the Bible in their native tongue, Boers eager to score points against the invading Englishman.

Again, this should be simple to establish; name one part of the KJV that was mistranslated, so I can compare it to independent translations.

> can you imagine under what circumstances you would have destroyed or allowed to be lost this precious and expensive ancient document

Fire. Flood. Rodents. Death with heirs that don't care.

if no ancient document exists that can be proven as an authentic source

You're overestimating the survival rate of ancient documents, as well as overestimating the percentage you have actually read.

>725 faktorovich: you guys are digressing into absolutely irrelevant subjects

It's interesting how many subjects you claim to be an authority on. You annoyed me when you treated GitHub as a place to dump a doc file, instead of paying attention to the culture, which would use plain text/Markdown or HTML, or things like TeX or Texinfo. The Bible stuff is just something I know fairly well that your arguments are absurd and a bit insulting to the people who have spent their life translating the Bible.
----

Let me repeat this: name one part of the KJV that was mistranslated, so I can compare it to independent translations. It's a simple test of your claim.

728faktorovich
Mar 21, 2022, 8:40 pm

Questo messaggio è stato segnalato da più utenti e non è quindi più visualizzato (mostra)

>726 anglemark: Why haven't you shared this image you have created by copying it in Acrobat Reader? You absolutely could not have "copied" images from "Gospels" in Reader as #1 the images in it are not separate from the text, so they cannot be automatically "copied". There is also no "copy" option in a simple pdf like this in Reader. You are just saying a bunch of nonsense. If you lie in your attempts to prove somebody else incorrect; then, you are really only proving yourself to be a liar.

729faktorovich
Modificato: Mar 21, 2022, 9:29 pm

>727 prosfilaes: If somebody spends their entire life studying anything without understanding or explaining what it is they have been studying truthfully and correctly; then, they have been wasting their time. It is more rational to have spent 3 years as I will have done in a few months on this BRRAM series to reach the precise truth on the subject, and then to move on to other related subjects that have not been previously understood.

Neither you nor I have previously specifically mentioned "11th century" researchers' knowledge of "Paleo-Hebrew". What I was talking about is if somebody in the Early Modern English period could forge "Old English"/"Saxon"/"Old German" font/texts.

By "Old English" versions of the Bible I am referring to King Alfred's, Wessex Gospels etc. You guys previously quoted from "Old" and pre-King-James-Bible "Early Modern English" versions of the Bible earlier in this discussion. In a few of the sources I reviewed today, scholars have explained that the earliest form of "Old English" was identical to, or was indeed the same as "Old German". You mentioned that some of the earlier versions of the Bible were in German. What about all this is confusing you? You have not quoted a single letter in Hebrew or Ancient Greek - you have been quoting from variedly recognizable versions in English, such as the "Wycliffe" Bible in Middle English.

In the Renaissance, Latin was a native or first-language taught to children. There are no speakers of Latin in our modern world, and thus translating from Latin would be infinitely more difficult today than it was in the Renaissance. Same for Ancient Greek, and pretty much all of the variants in which the earliest versions of the Bible were written including the Paleo-Hebrew of the first manuscripts of the Old Testament.

No, as I mentioned before Verstegan was assisted in a secondary capacity by Harvey in translating the King James Bible; he did not do it alone; there were two of them. Why would I care if other translators are lazy? Why do you care? I am just pointing out the truth that there are inaccuracies and falsehoods in the history we are taught in school, and that these errors can be corrected with modern tools. I am going about the work of making these corrections, while you are wasting your time in complaining that by doing so I am proving all preceding historians and translators to have been too "lazy".

I learned Hebrew in elementary and middle school, so I know there are speakers of Modern Hebrew, but translating the "Paleo-Hebrew" variants would be an extreme challenge even for a native Modern Hebrew speaker. There have been plenty of Hebrew-fluent Jews who have pointed out mistakes in the Christian translations of the Bible; if you are not familiar with what they have said...

To fully point out a mistranslated part of the Bible, I would have to quote in Paleo/Hebrew, Latin or Ancient Greek; then, I would have to explain the linguistic errors in the various translations that have followed. And then, you are saying you are going to take all of that work (I will have invested without needing it for my BRRAM), and you are going to compare it to the "modern" translations? Do you know or have you researched Hebrew, Latin or Greek (I am a native Russian speaker, which uses the Cyrillic alphabet, which is derived from Greek; so Greek is not as foreign to me as it is to you; and my PhD dissertation was partially on the linguistics of dialects and variants of English including Scottish/ Gaelic; and I took Italian in high school, and Italian is descendant from Vulgar Latin)? How do you imagine you are going to understand or correct me, if you are having trouble understanding the difference between "Middle English" and "Latin" versions of the bible? If you can explain how you plan on testing if I am wrong in pointing out the errors in advance, I am going to go ahead with explaining the error(s) to you. I am just curious if you are seriously concerned about biblical translation or if you keep asking the same request just because I have ignored it before, and you think it sounds like I am saying I can't find any errors.

So you envision an army of rodents that only ate the original book about the demons, and left the book that was based on that book about the demons alone?

730prosfilaes
Mar 21, 2022, 11:56 pm

>729 faktorovich: If somebody spends their entire life studying anything without understanding or explaining what it is they have been studying truthfully and correctly; then, they have been wasting their time.

And how should one know if they have been studying truthfully and correctly?

Neither you nor I have previously specifically mentioned "11th century" researchers' knowledge of "Paleo-Hebrew".

>714 faktorovich: There is no concrete evidence to support the speculations of 500 BC vs. 11th century AD etc.

I'm guessing you grabbed some numbers out of the air or out of context, but the only thing the 500 BC could refer to in the Biblical context is the Ketef Hinnom scrolls.

In a few of the sources I reviewed today, scholars have explained that the earliest form of "Old English" was identical to, or was indeed the same as "Old German". You mentioned that some of the earlier versions of the Bible were in German. What about all this is confusing you?

That incredible non-sequitur, for one.

I am going about the work of making these corrections, while you are wasting your time in complaining that by doing so I am proving all preceding historians and translators to have been too "lazy".

No, I'm complaining that you're calling all preceding historians and translators lazy instead of understanding that most of them were very diligent, and that their errors are going to reflect that. You assume that because it's hard to translate from Ancient Greek and Hebrew, that they didn't do that, instead of assuming they tried their best with the knowledge they had.

To fully point out a mistranslated part of the Bible, I would have to quote in Paleo/Hebrew, Latin or Ancient Greek;

I've pointed out a couple of times what languages the Bible was originally written in, but you still don't seem to know. It's not written in Latin, but there are parts written in Aramaic.

If you can explain how you plan on testing if I am wrong in pointing out the errors in advance, I am going to go ahead with explaining the error(s) to you. I am just curious if you are seriously concerned about biblical translation or if you keep asking the same request just because I have ignored it before, and you think it sounds like I am saying I can't find any errors.

I don't care about errors in Biblical translation. You claimed that the KJV was deliberately mistranslated and those changes were carried down to the present. That is the claim that I dispute. If you give me a simple Biblical reference, less than 10 characters, I can compare the translation in KJV to modern English versions, older English version (Wycliffe, the Geneva Bible), and German, Spanish, Italian, and other older translations. It would be nice to go back to the original languages, but for the claim, I don't really need to.

You could have given me a ten character reference, and I could have checked it. Instead you gave four thousand characters that buttressed your claim not one bit.

731prosfilaes
Mar 22, 2022, 12:01 am

>729 faktorovich: I am a native Russian speaker, which uses the Cyrillic alphabet, which is derived from Greek; so Greek is not as foreign to me as it is to you;

I originally skipped this as a stupid argument; learning a small alphabet is hardly the hard part of learning a language. But then something clicked, and I remember that the Latin alphabet is also derived from Greek, making this an amazingly stupid argument.

732lorax
Mar 22, 2022, 10:04 am

I so rarely get to use the phrase "fractally wrong" - wrong at every possible level of detail, no matter which aspect of something you start looking at more closely you only see more levels of wrongness - but this thread does bring it to mind.

733faktorovich
Mar 22, 2022, 12:17 pm

>730 prosfilaes: Indeed, those who are not studying the truth correctly will never know if or how they were in error.

"I'm guessing you grabbed some numbers out of the air or out of context, but the only thing the 500 BC could refer to in the Biblical context is the Ketef Hinnom scrolls."

The above is a great example of how you take any valid point I raise and turn it into a problem by merely insulting me and making it sound as if insulting the opponent means whatever they were arguing can be discounted. Yes, the "Ketef" manuscript is claimed to have been created in around 500 and that's the one I meant. If I had written a longer paragraph to explain the nature of this manuscript and why the dating is obviously incorrect, you would have found a problem with me writing in long sentences or using words that are too long or some other irrelevant nonsense.

I am not talking about abstract errors of abstractly lazy researchers. I have pointed out errors of previous researchers in every annotation/ introduction/ sentence of the BRRAM series. For starters, I have re-attributed 284 Renaissance texts with my 27-tests computational-linguistics method. But this is only the start, I explain translation/ linguistic/ evidentiary/ handwriting analysis and various other types of errors previous researchers made throughout. One basic mistake you can start with is that you still have not actually read any of the 14 published volumes of BRRAM, despite my offer to send free review copies. Not reading the details of an opposing argument when arguing against it is in itself an error. You guys have not even managed to look at the pictures in the handwriting-illustrations file I linked to. Nobody should care if researchers "tried" or were "lazy"; all that matters is the final result, and if I the attribution of this entire century has been incorrect; then, past researchers' results in this field have been atrocious.

Only tiny fragments survive in the Aramaic. Other than the Greek Septuagint, the earliest full surviving version of the Bible is the Vulgate, which is in Latin.

"I don't care about errors in Biblical translation." Why are you engaging in this discussion if at the center, you do not care about the process of translation, nor understand how it works. The process of translation changes the text completely each time it is done from one language into another. "Deliberately mistranslated"? All translation is deliberate, and the error is in the eye of the beholder. If a translator has a different philosophical/theological perspective from the original author, they will instinctively make changes even without intending them. But yes, there are many cases where the Renaissance Workshop made alterations to fit with their own theological beliefs, propagandistic needs, political climate, and various other drivers. In the previous review of the "666" passage, you quoted the lines from the different Bible variants, but you did not read or analyze any of them closely enough to grasp why it is referring to the number "666" as evil. In response, I explained to you how you should go about studying numerology. Why don't you go back and re-read the fragments you have already quoted to see the errors or changes each of these translations has made to this single passage. If you don't notice how the biases of these translators shifted the meaning; then, you have not spent enough time on reading and too much time in repeating the same request for me to quote any random lines from the Bible. I would have to explain the errors to you in the "666" passages or in any other word or sentence I found to be in error, and I would have to probably spent at least 8,000 words on the explanation to explain the errors between the first manuscript and the latest biblical variant. So, if you "don't care" about these details, you should move on to reading about fictional demons, or whatever else floats your interest.

734faktorovich
Mar 22, 2022, 12:20 pm

>731 prosfilaes: Yes, the languages of the world connect into an interrelated tree. The branches diverge into the Latin and the Greek-derived branches; somebody who is fluent in one of the languages in the Greek branch will have an easier time understanding Greek than somebody who has never veered outside of the Latin branch.

735faktorovich
Mar 22, 2022, 12:29 pm

>732 lorax: Nothing can be "wrong at every possible level of detail" because, for example, even if you say, the "sky is red"; if you look at every pixel of the sky at every hour of day or night or on every day of the year, you will find spots in the sky that are "red". It would take an incredible level of complexity to make a statement that is in fact absolutely wrong. And since BRRAM includes 14 books in it so far, it would be impossible to find something wrong with every one of the 2 million or so words in this series. And the term "fractally wrong" is only listed in the "Urban Dictionary". Meriam-Webster defines "fractal" as "any of various extremely irregular curves or shapes for which any suitably chosen part is similar in shape to a given larger or smaller part when magnified or reduced to the same size". So the "Urban Dictionary" definition for "fractally wrong" is incorrect, as it states, "Being wrong at every conceivable scale of resolution", but "fractally" does not mean "zooming in", but rather that the shape is such that the part of the shape is similar to a larger part. To be indeed "fractally" wrong is to have the same type of error with a tiny part of a series such as its single word, and a giant part of the series such as a whole volume; this is nonsensical as a word and a whole volume would have entirely different types of errors in them (if there were any errors). There, now I have explained a linguistic error, even though I did not use one from the Bible.

736lorax
Mar 22, 2022, 12:36 pm

I'm beginning to wonder if this isn't a case of fractal wrongness after all, but a classic troll in the original Usenet sense of the word - someone posting stuff they don't believe at all, increasingly ludicrous, to see who will take the bait and spend time arguing with ludicrous propositions like "learning a language basically is just learning the alphabet it's written in" or "there's no possible way two different people in close communication could have similar writing styles".

If so, it is a masterfully done example.

737faktorovich
Mar 22, 2022, 9:01 pm

>736 lorax: You mean in the Cambridge Dictionary denotation of troll: "someone who leaves an intentionally annoying or offensive message on the internet, in order to upset someone or to get attention or cause trouble"? I have noticed that, as you say, those who have been responding to me in this discussion 1. "don't believe at all" what they are claiming to believe; 2. are making statements that are "increasingly ludicrous", as if they have stopped trying to find any rational objects to my "Re-Attribution" of the Renaissance, and are now just lazily grasping at anything negative that can be said about word-processing, or any minor word or point that is mentioned somewhere; 3. and it seems the point is to digress into trivial topics to distract readers attention from the actual history-changing research I have done and onto points that seem silly and inconsequential. I do object to these nonsensical trollings being "masterful". Somebody who just wants to understand my research can just follow the link I provided to BRRAM to read what it's about for themselves. Those who will be convinced scientific findings are insignificant just because trolls are discussing insignificant points are not the intended audience for BRRAM.

738prosfilaes
Modificato: Mar 23, 2022, 12:12 am

>733 faktorovich: Nobody should care if researchers "tried" or were "lazy"; all that matters is the final result, and if I the attribution of this entire century has been incorrect; then, past researchers' results in this field have been atrocious.

If. Or it could be that history has been a pattern of very intelligent people standing on each other's shoulders and for all their mistakes, their results have been generally good.

"I don't care about errors in Biblical translation." Why are you engaging in this discussion if at the center, you do not care about the process of translation, nor understand how it works.

That's not what I said. I do care about the process of translation; I do understand how it works. I even engage in it, at a very minor level. But I don't care to fuss over theology driven arguments about the proper way to translate something. I'm marginally interested in unquestionable errors in the KJV, as they could resolve the question of how much dependence later translations had on the KJV.

"Deliberately mistranslated"? All translation is deliberate, and the error is in the eye of the beholder. If a translator has a different philosophical/theological perspective from the original author, they will instinctively make changes even without intending them. But yes, there are many cases where the Renaissance Workshop made alterations to fit with their own theological beliefs, propagandistic needs, political climate, and various other drivers.

You're evading. In >708 faktorovich:, you said There are many blatant changes in the King James version,, now suddenly the error is in the eye of the beholder. That's what I'm really interested in, times where the translators of the KJV looked at the texts before them and wrote something in English that they knew didn't correspond to the original. The 666 thing would have been a great example; if the other texts hadn't mentioned 666, then it would have been a clear change.

Only tiny fragments survive in the Aramaic. Other than the Greek Septuagint, the earliest full surviving version of the Bible is the Vulgate, which is in Latin.

This is false; there are complete copies of the original Greek New Testament and Hebrew Old Testament (including the Aramaic parts of the later).

>734 faktorovich: Yes, the languages of the world connect into an interrelated tree.

Modern linguists believe there's no way to tell if all languages are interrelated or not, that there is simply not enough information about languages before they were written.

The branches diverge into the Latin and the Greek-derived branches; somebody who is fluent in one of the languages in the Greek branch will have an easier time understanding Greek than somebody who has never veered outside of the Latin branch.

The branches diverge a lot, but Hellenic languages and Romance languages are two branches, and someone fluent in Modern Greek or Ancient Greek or one of a handful of closely related languages will find the other languages in that family easier to learn than someone who only knows a Romance language. However, neither English nor Russia are part of either branch.

739anglemark
Modificato: Mar 23, 2022, 7:42 am

>734 faktorovich:
"Yes, the languages of the world connect into an interrelated tree. The branches diverge into the Latin and the Greek-derived branches; somebody who is fluent in one of the languages in the Greek branch will have an easier time understanding Greek than somebody who has never veered outside of the Latin branch."

Do you stand by that claim, or did you type too quickly (which happens to us all sometimes)? I ask because it's such a fundamentally incorrect assertion. Languages of the world are traditionally grouped into "families" such as the Sino-Tibetan languages, the Afro-Asiatic languages, the Papuan languages, the Uralic languages, and the Indo-European languages. Latin and Greek are two "branches" in the Indo-European language family tree, or, to be more exact, Italic and Hellenic are branches which then branch out further, such that Latin appears on the Italic branch and Greek on the Hellenic. The absolute majority of the world's languages (including, as prosfilaes points out, Russian and English), are not part of either of those two branches. This is the traditional model, which is debated (as prosfilaes also mentions). But to be absolutely clear, there is no new or competing theory that places Latin and Greek as universal "parent" languages of all other languages.

If you meant to say "writing system" rather than "language", it is also not true that all writing systems in the world derive from Latin or from Greek. Cyrillic and Latin are two alphabets that derive from the Phoenician alphabet, by way of the Greek alphabet. And someone who reads Russian Cyrillic fluently is not more likely to understand spoken or written Greek than someone who reads English fluently.

-Linnéa

740faktorovich
Mar 23, 2022, 2:06 pm

>738 prosfilaes: You are right, it is not an "if", but rather a certainty that a century of British history has been misattributed and my study explains the facts of what took place. If historians have been merely repeating the version of history proposed by propogandists, they have been functioning either as propogandists themselves or as plagiarists or re-writers of the stories told by predecessors.

If you care about the impact of KJB on later translations, you have to engage in more than a little translation or to dive into the preceding and later versions to perform a full comparative analysis of the changes. I am merely stating the fact that KJB is the first full English translation of the Old and New Testaments and it was ghostwritten by Verstegan with Harvey's help. The KJB has since become the standard English Bible used internationally with only minor modernizations in later versions. I am an atheist, so as far as I am concerned the first manuscripts of the Abrahamic Bible were a series of fictions, and anything that was added in KJB was merely editing and expanding a fictional narrative.

There are plenty of mistakes in the "666" quotes you used earlier. For example, somebody else commented: "There is a variant preserved in ancient MS that gives 616. But 666 was in the medieval Vulgate". If any of the variants used 616, and others used 666, one of these writers introduced an error. If you look at each of the individual words in the variants surrounding these "666"/"616" references you will find similar inconsistences, errors, and various types of biased editing during each translation.

The earliest two full surviving Bibles are in Greek and Latin. All of the preceding variants are not "full" texts, but rather fragments. Yes, there were later variants in full in other languages; I did not dispute this point.

In the Summer 2021 issue of my PLJ journal I reviewed: Steven Roger Fischer, "A History of Writing" (London: Reaktion Books Ltd., 2001/2021). As part of this review I mentioned a diagram of “Afro-Asiatic” languages (330). It shows that the term “Semitic” refers to a family of alphabets stretching from the West Semitic Alphabet in 1500 BC to most of the modern Middle Eastern, European and Indian languages that are used today (Hebrew, Latin, Greek, Indic, Arabic, as well as Hebrew)... You can learn more about this book from the rest of my review, or you can read this book to understand why you are wrong in thinking scientists have not proven there is a tree that connects all of the world's languages. You saying a bunch of nonsense about "Russian" and "English" not being part of the main branches in the tree. Just read this book before saying anything else on this point.

741faktorovich
Mar 23, 2022, 2:30 pm

>739 anglemark: My knowledge of the Cyrillic Russian alphabet means that I am more familiar with the Greek alphabet and language structure because Cyrillic was developed based on the Greek alphabet. Meanwhile, the Latin alphabet was the basis for Germanic and Romance languages. I studied Italian in high school, and the Italian language is especially close to the dead Latin language because Latin was developed on the Italian Peninsula before it spread across the world via the Roman Empire. If you look in a dictionary of almost any modern language you will find some words that are classified as originating from Greek or Latin, and probably no words that have been proven to derive from earlier languages such as Egyptian hieroglyphs. Are you trying to use pre-hieroglyphs languages to prove knowledge of the Cyrillic alphabet does not help with comprehension of Greek vs. a lack of knowledge of any Greek-based alphabet?

742anglemark
Modificato: Mar 23, 2022, 4:53 pm

>741 faktorovich:
"My knowledge of the Cyrillic Russian alphabet means that I am more familiar with the Greek alphabet and language structure because Cyrillic was developed based on the Greek alphabet."

Do you understand the difference between a language and an alphabet?

The Cyrillic alphabet is derived from the Greek alphabet.

The Latin alphabet is also derived from the Greek alphabet.

This is shown on page 296 in A History of Writing. (It might be a different page if you have the 2021 edition.) Refer back to that book, keeping in mind that Fischer discusses alphabets and other writing systems. He refers to the "Afro-Asiatic writing tradition", which is not the same thing as the Afro-Asiatic language family.

The Russian language is not derived from the Greek language. It is part of the Slavic branch of languages, in the Indo-European language family. Greek is not a Slavic language.

"Meanwhile, the Latin alphabet was the basis for Germanic and Romance languages."

No. The Latin alphabet is not a basis for any language. I get that it can be confusing for a layman that the alphabet we use is called "Latin" even though many languages that are not derived from Latin use that alphabet. But that's how it is.

"I studied Italian in high school, and the Italian language is especially close to the dead Latin language"

Yes! You got that right. Latin is the "ancestor" of the Romance languages in the Indo-European language family.

"If you look in a dictionary of almost any modern language you will find some words that are classified as originating from Greek or Latin, and probably no words that have been proven to derive from earlier languages such as Egyptian hieroglyphs. "

Here you get a little confused again. Egyptian hieroglyphs are a writing system, from which the Phoenecian alphabet derives (again, see Fischer's book) and as I mentioned above, the Greek alphabet derives from the Phoenecian alphabet. But once again, an alphabet is not a language.

"Are you trying to use pre-hieroglyphs languages to prove knowledge of the Cyrillic alphabet does not help with comprehension of Greek vs. a lack of knowledge of any Greek-based alphabet?"

What pre-hieroglyphs language? And what does the second part of that sentence even mean?

What I said was this:

The Greek alphabet is the ancestor of the Cyrillic alphabet. The Greek alphabet is also the ancestor of the Latin alphabet, the one we are using right here. As a result, the fact that you and I and Johan (the other Anglemark), and probably other people in this discussion thread as well, are familiar with the Cyrillic alphabet does not automatically make us more familiar with Greek – and it certainly doesn't make us experts on Koine Greek. The original question was to do with the Bible, after all.

-Linnéa

743prosfilaes
Mar 23, 2022, 7:10 pm

>740 faktorovich: The KJB has since become the standard English Bible used internationally with only minor modernizations in later versions.

If you mean the KJV as sold as the KJV, it is the most common English bible, though questionably standard. But new translations have more than minor modernizations; they are in fact new translations.

"There is a variant preserved in ancient MS that gives 616. But 666 was in the medieval Vulgate". If any of the variants used 616, and others used 666, one of these writers introduced an error.

Okay, so can I understand that you no longer claim that the translators of the KJV willfully made changes to their version of the Bible, and this is all about uninteresting accidental errors and biases?

A History of Writing? The history of writing is fascinating, and distinct from the history of languages. Even then, scientists have not proven there is a tree that connects all of the world's writing systems. Chinese writing seems unrelated to the Semitic writing systems, even if most other writing systems trace back to one of those two.

You saying a bunch of nonsense about "Russian" and "English" not being part of the main branches in the tree.

You claimed The branches diverge into the Latin and the Greek-derived branches, which is not true for writing (Latin is derived from Greek), and the Latin and Greek branches of Indo-European are co-equal with the Balto-Slavic and Germanic branches that Russian and English respectively land in.

744faktorovich
Mar 23, 2022, 9:12 pm

>742 anglemark: Yes, I understand the difference between "language" and an "alphabet". I have been explaining the difference to you guys, and instead of understanding my explanations, you guys keep preferring to insult me in exchange by suggesting I do not understand these terms. You really should unglue yourself from elementary-school tactics.

I found this simple diagram to explain to you the relationship between Cyrillic, Greek and Latin: https://commons.wikimedia.org/wiki/File:Venn_diagram_showing_Greek,_Latin_and_Cy.... There are some letters that appear in Russian that also appear in Greek and/or Latin. The relationship between English and Greek is much lighter.

You linked to the book, and not to the specific page. You can just look at the other pages around that page if you think you have an older edition than my newest 2021 edition.

You are looking at the language tree upside-down when you make statements like, "Greek is not a Slavic language." Yes, Russian is derived from Greek, and not the reverse. I did not claim Greek is a Slavic language. Find a tree of languages and study the branches between Greek and the Cyrillic alphabet. Think for a long while. Perhaps turn your tree the right way up. And then return to this conversation.

Both the Latin language and alphabet are indeed the "basis" ("the underlying support or foundation for an idea, argument, or process") for most modern languages. The basic letters, some grammatical rules, and many of the words have been borrowed from Latin and adopted into languages across Europe, Middle East and other regions, and in the modern-day these borrowed-from-Latin words have been re-borrowed and adopted into all other parts of the language tree, as for example, English words are adopted internationally. Other languages, such as Greek, have also been the "basis" of many languages. You are making completely irrational statements just for the same of contradicting me.

An alphabet are the letters that make up the smallest pieces of a language. "Egyptian hieroglyphs" are not merely an "alphabet", as there are some hieroglyphs that represent words or syllables, and this term stands for the entire Egyptian system of writing; and thus, the term "Egyptian hieroglyphs" can be used to describe the language of the Egyptians.

A "hieroglyph" is an image that represents a specific word, sound or letter. Before hieroglyphs people would just draw on walls or the like images that meant something to them, but was not necessarily repeated by anybody else because there were no established linguistic rules. I can't address your general confusion about my meaning. You have to be specific.

All those who are fluent with the Cyrillic alphabet, are a bit more capable to understand Greek than those without fluency in Cyrillic. Refer to the diagram above if you are confused.

745faktorovich
Mar 23, 2022, 9:26 pm

>743 prosfilaes: "The translators of the KJV willfully made changes to their version of the Bible, and this is all about uninteresting accidental errors and biases?" There is no difference between making "willful changes" and introducing "biases". "Accidental errors?" If the "616" vs "666" divergence was an accident, you are really saying the translator was so distracted he or she thought the 6 was a 1, or maybe was so sleep-deprived she wrote 1 instead of 6? Or are you saying this translator mistook a ten for a sixty because he or she did not understand this term in the other language? The translator might have just inserted the error as a joke. I have no idea where you guys are going with your biblical-inerrancy argument. I have been saying exactly what I have been trying to say, and have not changed my position.

Even if there was no single root of all human languages in the past, the intermingling of languages in our present globalized world has meant that there is now a single tree that connects all languages. Greek can be earlier in the tree, but it can also have its own branch, while Latin has its own branch as well - you are just pointing out where the branch division begins, which is not something I have objected to. It is irrational to think some languages are not "equal" to other languages. They are obviously all structures for expressing the world in words, letters, syllables etc. I am not making a value judgement against English as too Germanic to understand Greek. I am simply stating that the Cyrillic alphabet is derived from the Greek alphabet, and the Italian language is one of the if not the closest languages to Latin; the Cyrillic alphabet is closer than the English alphabet to the Greek alphabet; and the Italian language as a whole is closer than English to the Latin language. Don't ask me to explain these points. Read the "History of Writing"; I cited it to invite you to learn more about this topic there vs. from me.

746prosfilaes
Mar 23, 2022, 10:33 pm

>744 faktorovich: I found this simple diagram to explain to you the relationship between Cyrillic, Greek and Latin: https://commons.wikimedia.org/wiki/File:Venn_diagram_showing_Greek,_Latin_and_Cy... There are some letters that appear in Russian that also appear in Greek and/or Latin. The relationship between English and Greek is much lighter.

That Venn diagram has four letters in the intersection between Latin and Greek, excluding Cyrillic, and three letters in the intersection between Cyrillic and Greek, excluding Latin. Why are you citing it?

Yes, Russian is derived from Greek, and not the reverse.

No, Russian is derived from Old East Slavic, which is derived from Proto-Balto-Slavic.

>745 faktorovich: There is no difference between making "willful changes" and introducing "biases".

One is willfully making changes that don't correspond to the original, and the other is translation choices that are defensible, especially in isolation.

If the "616" vs "666" divergence was an accident, you are really saying the translator

I.e. you haven't looked up anything about 616 vs. 666, which is about disagreeing Greek sources, not translation.

I am simply stating that the Cyrillic alphabet is derived from the Greek alphabet, and the Italian language is one of the if not the closest languages to Latin; the Cyrillic alphabet is closer than the English alphabet to the Greek alphabet; and the Italian language as a whole is closer than English to the Latin language.

You're conflating alphabets with languages, and I'm not sure you wrote what you meant to write.

Read the "History of Writing";

Which isn't relevant; we said that knowing Cyrillic is not relevant in learning Greek, as the Greek alphabet is not a significant hurdle to learning the language. That is a question in language education, not about writing.

747lorax
Mar 24, 2022, 8:54 am

A question for you, faktorovich, since I still don't know whether you think an alphabet is a language, that the choice of alphabet is the most important factor in ease of language learning, that languages that use the same alphabet are more closely related than those that don't, or something else:

The Urdu language uses an alphabet derived from the Arabic alphabet.
The Hindi language uses the Devangari script.

Which language would you expect to be easier for a native speaker of Urdu to learn, Arabic, written in a closely related script, or Hindi, written in an entirely unrelated script? Does your answer differ if you're thinking primarily about spoken vs written language? Were you previously familiar with the linguistic relationships, if any, between these languages?

748faktorovich
Mar 24, 2022, 1:11 pm

>746 prosfilaes: Yes, all languages are interconnected. If you don't remember what this argument is about; it is probably over.

To "derive" means "obtain something from (a specified source)". Greece was an empire before Rome, and in parallel the ancient Greek language of Greece formed before the Latin language of Rome. Russian is a few variants up the language tree; Russian is derived from both Greek and Latin; Russian's Cyrillic alphabet is derived more from the Greek alphabet than from the Latin alphabet; the point that there were other variants between Greek - Latin - and Russian is irrelevant. A dictionary of Russian, Italian or English includes some words derived from both Greek and Latin. Familiarity with the Cyrillic alphabet means one is more familiar with the letters of the Greek alphabet.

A biased change is also a willful change, and a willful change must be made out of an inherent or profitable bias. This discussion proves that you "can" attempt to defend any false position. Even if you can defend all willful/biased changes; these are still changes to the meaning that the original authors of the biblical fictions intended.

I am "conflating" (meaning: "combine (two or more texts, ideas, etc.) into one") the "alphabet" with "language" because an alphabet is a part of language, so failing to conflate or combine the alphabet into a language or isolating the alphabet from language is impossible.

Wow. And why haven't you attempted to argue that my familiarity with Hebrew is irrelevant because new vocabulary has entered the Hebrew language since the time the ancient Hebrew Bible was written?

749faktorovich
Mar 24, 2022, 1:27 pm

>747 lorax: To clarify, I have under-explained my knowledge of foreign languages. I also lived in China for a semester while teaching at a university, and learned a bit of Chinese linguistics, as I was teaching the differences between Chinese and English. And also I spent some of my youth in Ukraine, so I was exposed to some Ukrainian. I have also been a fan of nineteenth century French fiction (reading most of the greats), and translated some French out of necessity. My grandparents spoke mostly in Yiddish, so I was exposed to some of that language. And I summered in Talinn as a child, which exposed me to Estonian. I am curious to see if you can find a way to use all these other languages in your case against me as a human capable of translation.

They Cyrillic language is used in many different countries and by different languages. It is difficult for somebody to learn a new alphabet fluently after they are around 14. An innate familiarity with a second alphabet from a different branch of the language tree helps a person to grasp a bit more innately when studying the rest of that branch of the tree than somebody to whom that branch is entirely foreign. I am qualified to translate Renaissance texts that use quotes and fragments in multiple languages aside from Early Modern English and its predecessors (Greek, Latin, Italian, French, etc.) because I have spent my life diving into new languages and finding ways to understand them.

750lorax
Mar 24, 2022, 2:16 pm

You have not answered my question. I am not asking about your facility with different languages, nor about your ability to translate.

I asked a very simple, straightforward question about the relative ease of learning languages using different scripts with one specific, concrete example.

751Keeline
Mar 24, 2022, 2:57 pm

>748 faktorovich:

A biased change is also a willful change

This assertion causes me some concern. In most examples, a bias is largely unconscious. It is often a product of one's circumstances and background. That is practically the opposite of a "willful" or conscious decision.

James

752prosfilaes
Mar 24, 2022, 8:37 pm

>748 faktorovich: isolating the alphabet from language is impossible.

Linguists disagree; many don't consider writing a part of language at all. I think that's a bit excessive, but even just 500 years ago, most languages had never been written, and most people were illiterate, even speakers of languages with writing. Many languages have multiple scripts; Serbo-Croatian is written in both Latin and Cyrillic, and a number of Central Asian languages went through a number of scripts, Arabic, then Latin (under the Soviets) then Cyrillic (under the Soviets) and in modern days back to Latin.

>749 faktorovich: It is difficult for somebody to learn a new alphabet fluently after they are around 14.

Citation needed. That would require a specific test with test subjects that I don't know that anyone has done. I'm not even sure what it means; I'm pretty sure if I mapped Latin letters to Greek ones, I could read English in those letters nearly as fast I can read English in hours. I'm not convinced it would be any easier to learn Greek in Latin letters than it is to learn Greek in Greek letters.

753faktorovich
Mar 24, 2022, 9:05 pm

>750 lorax: I am going to quote your original question and address it. You stated: "the choice of alphabet is the most important factor in ease of language learning". No, familiarity with an alphabet is not the "most important factor" to learn a language. Familiarity with an alphabet assists with reading texts, and understanding their phonetics. "That languages that use the same alphabet are more closely related than those that don't". Yes, languages that share an alphabet are "more closely related", by definition; the sharing of the alphabet is how they are related.

"The Urdu language uses an alphabet derived from the Arabic alphabet.
The Hindi language uses the Devangari script.

Which language would you expect to be easier for a native speaker of Urdu to learn, Arabic, written in a closely related script, or Hindi, written in an entirely unrelated script? Does your answer differ if you're thinking primarily about spoken vs written language? Were you previously familiar with the linguistic relationships, if any, between these languages?"

One of the only regions on earth I have not yet visited is India, so I am not familiar with these languages. I have looked at these two alphabets; visually they appear to be in related parts of the language tree. Since both Urdu and Hindi are used in India, these two languages should have many shared words in their dictionaries, so somebody who speaks one of these languages should understand some of what the other language's speaker is saying. Both languages are also likely to have a lot of words borrowed from English and its predecessors like Latin and Greek, after the hundreds of years of cultural colonialism. The Arabic alphabet is a descendant of Egyptian hieroglyphs, like Greek; and it is in the same branch of the tree with Hebrew. Arabic and Hebrew are further away from each other than Greek and Cyrillic. I only think of a spoken language when I am considering regional/class dialects, whereas in other cases I care about the written language as it is recorded in dictionaries, as this is how translators digest languages to present them to readers.

754faktorovich
Mar 24, 2022, 9:13 pm

>751 Keeline: You might be thinking of this definition of bias: "prejudice in favor of or against one thing, person, or group compared with another, usually in a way considered to be unfair." Whereas, I am thinking more of: "a systematic distortion of a statistical result due to a factor not allowed for in its derivation." When I think of a biased translation edit, I am visualizing an intentional distortion of not the statistics, but the intended meaning in a word, phrase, sentence or the entire book. For example, introducing an error that switches the wronged party from being "Democrats" to "Republicans", or changing who is "good" or "evil" in a history by changing the symbolism (such as altering black to white, or red to the sunny yellow). It is perhaps possible to unconsciously make such biased changes, but the translator would have to be in a trance not to notice this process.

755faktorovich
Mar 24, 2022, 9:26 pm

>752 prosfilaes: A single alphabet can be reused in many different languages. When any alphabet is used together with any language, the two cannot be "isolated" without taking out all of the letters or the entire written text that is being considered. If a language is spoken without any associated written alphabet; yes, then the illiterate speakers have managed to isolate language from the alphabet, but if they do start using an alphabet it remains true that an alphabet can never be isolated from language.

I read about the 14-year cut-off in grad school. In a brief search for sources that state this point I found Ian Mackenzie's "English as a Lingua Franca" (2014; page 37): "Children acquire new dialects and languages perfectly up to about eight, but there is very little chance of learning a language variety perfectly from mere exposure after the age of about fourteen."

756anglemark
Mar 25, 2022, 3:13 am

>755 faktorovich:
"...there is very little chance of learning a language variety perfectly from mere exposure after the age of about fourteen."

This doesn't mean (or imply) that "it is difficult for somebody to learn a new alphabet fluently after they are around 14." It means that around that age, it becomes significantly harder to learn a language variety to a near-native level without studying the language actively. The paragraph in the book discusses language learning (in a classroom or through active self-study), in contrast to language acquisition ("from mere exposure", i.e. through being surrounded by speakers of the language). The examples Mckenzie mentions are things like grammatical gender and subj-verb agreement. These things are harder to pick up after the mid-teens, without actively studying the language (or variety).

In other words, nothing to do with the alphabet, and not a matter of "difficult to learn fluently".

-Linnéa

757faktorovich
Mar 25, 2022, 12:44 pm

>756 anglemark: You are arguing with yourself. You are making up problems, and then arguing with them. I made a very simple true statement, and supported it with a citation that explains the complexities of this issue. You have apparently read the contents around this citation, as you should have done, and are reporting these details. It remains true that there is a benefit to learning more than one language prior to 14. I did not make the absurd claim that it would be impossible for any human to learn any new alphabet after 14. What I am researching is the mixture of languages used by the British Renaissance Ghostwriting Workshop. Feel free to refocus future questions on my research, and not on McKenzie's - if you have questions about McKenzie's research, perhaps you can start a separate discussion thread about that book.

758prosfilaes
Mar 25, 2022, 8:33 pm

>757 faktorovich: It remains true that there is a benefit to learning more than one language prior to 14.

Not what the cite you pulled up says, and not something anyone was arguing against.

I did not make the absurd claim that it would be impossible for any human to learn any new alphabet after 14.

And no one said you did.

Feel free to refocus future questions on my research,

Have you put your method to the test and verified it against known results? Or have you confirmed your results on these works with other more standard methods? If the answer is no, I don't know what there is to talk about.

759faktorovich
Mar 26, 2022, 1:29 pm

>758 prosfilaes: I have tested my method against all of the previous computational-linguistic studies of the British Renaissance that I could find. I explain the reasons why these previous studies are incorrect across the BRRAM series. And I have confirmed my findings with "standard" methods such as handwriting analysis, i.e., the link to the handwriting samples I used in this analysis: https://github.com/faktorovich/Attribution/blob/master/Illustrations%20of%20Hand.... I have also found various confessions of ghostwriting by the ghostwriters in letters/ texts, documented proof of fiscal etc. fraud, etc. etc. None of you have read any of this evidence as your comments prove. Instead of addressing a fragment of this evidence, like the handwriting comparison, you guys object to absurd things like that the handwritings are wiggly or scribbly, without even commenting on if the different bylines share handwritings in-line with my computational-linguistic re-attributions of these groups of bylines to a single ghostwriter.

During my translation work yesterday, I came across this section of Verstegan's "Restitution" that addresses the questions you guys have been asking about how the Workshop made up theological fictions that have been repeated as if they are ancient facts into modern times. There are numerous similar annotations I make across BRRAM. Verstegan writes:

"Josephus affirms that they made the foundation so deep and spacious that albeit the Tower was of such a great height (as by some writers is declared) yet it seemed to be far larger and broader than high. It contained in height, as Isidore says, five thousand and one hundred sixty-four paces (which may be understood for the paces then used)."

Here's is my annotation comment:

This is a reference to Isidore of Seville’s (560-636) "Etymologies", “Book IX”. Isidore does not mention any specific measurement for Babel. The pace measurement unit is equivalent to either one (around 30 inches) or two steps. Modern theologists tend to quote Verstegan’s 5,164 paces figure (or 12,910 feet high; in comparison the current highest building, Burj Khalifa, is 2,717 feet) and Isidore citation without citing Isidore directly. Verstegan clearly made up this figure, as it is not to be found in Isidore or other predecessors.

760prosfilaes
Mar 26, 2022, 2:42 pm

>759 faktorovich: I have tested my method against all of the previous computational-linguistic studies of the British Renaissance that I could find. I explain the reasons why these previous studies are incorrect across the BRRAM series.

Do you not know what I mean, or do you not care?

Instead of addressing a fragment of this evidence, like the handwriting comparison, you guys object to absurd things like that the handwritings are wiggly or scribbly,

My complaint was that you're handing me something as proof that I am totally unable to verify.

>136 faktorovich: If you cannot summarize the steps in 1 paragraph; it is not a method that can be tested without specialized knowledge, and thus it is only for insiders, and this makes it very easy for specialists in this field to manipulate data because their results cannot be checked or clearly understood even by advanced literature research specialists, and probably not even by computer programs in other fields.

Your standard. I'd prefer a serious handwriting expert speaking on the matter, but it's certainly something that can't be tested without specialized knowledge.

the questions you guys have been asking about how the Workshop made up theological fictions that have been repeated as if they are ancient facts into modern times.

I believe I was the only person on the subject, and I was asking specifically about the KJV.

761faktorovich
Mar 26, 2022, 9:04 pm

>760 prosfilaes: Your conclusion, "Do you not know what I mean, or do you not care?" is precisely true about yourself. You have not read Volumes 1-2 of BRRAM, so you do not know how I have verified and tested my method against prior methods. And you apparently "do... not care" about this, as you still haven't requested a review copy of the series, and thus have not made the minimum attempt to learn about my research.

The handwriting analysis is explained across the BRRAM series with various types of verifying evidence. The file I linked to includes the cited handwritten texts grouped by the authorial-signatures. So you should be able to see the similarity of the handwriting in each of these groups without any further evidence or research. If you want to read the detailed explanations of why/how etc. these match, you'd have to read BRRAM. You can find a "serious" handwriting expert of your choosing, and ask them to look at the evidence; I posted it publicly for free to assist all other handwriting analysists with accessing these documents and checking my findings.

It is absurd to be a theologian who cares about errors in KJV, but not about other errors around KJV that have been repeated into modern times (such as the precise size of the Tower of Babel). The similarity of the errors introduced to KJV and Verstegan's self-attributed "Restitution" is a major piece of evidence for proving his authorship of the KJV translation; so if you do not care to consider this evidence, you are biased against the larger ghostwriting conclusion, no matter how overwhelming the evidence might be.

762prosfilaes
Mar 27, 2022, 5:15 pm

>761 faktorovich: Your conclusion, "Do you not know what I mean, or do you not care?" is precisely true about yourself.

Which is the exact form of a "tu quoque" argument.

you do not know how I have verified and tested my method against prior methods.

I didn't ask that. I asked if you'd shown your results with more standard methods, or if you'd tested your method against something with known results and got the expected results.

It is absurd to be a theologian who cares about errors in KJV, but not about other errors around KJV that have been repeated into modern times (such as the precise size of the Tower of Babel).

You think yourself knowledgeable in all fields, when you obviously don't have enough theology to make that claim.

To quote Wikipedia:
Sola scriptura, meaning by scripture alone, is a Christian theological doctrine held by some Protestant Christian denominations, in particular the Lutheran and Reformed traditions of Protestantism, that posits the Bible as the sole infallible source of authority for Christian faith and practice. ... {it} rejects any infallible authority other than the Bible. In this view, all non-scriptural authority is derived from the authority of the scriptures or is independent of the scriptures, and is, therefore, subject to reform when compared to the teaching of the Bible.

That is, in many branches of Christianity, errors in non-Biblical sources are of minimal importance, whereas errors in Biblical translations are of vital importance. Wikipedia gives several figures for the height of the tower, and no evidence that anyone really cares about any of them; I personally don't see how it matters in the least how tall the Tower of Babel allegedly was.

The similarity of the errors introduced to KJV and Verstegan's self-attributed "Restitution" is a major piece of evidence for proving his authorship of the KJV translation

You've declined to give any evidence of any errors introduced to the KJV.

763faktorovich
Mar 27, 2022, 9:11 pm

>762 prosfilaes: My method is to combine 27 different "known" methods or test-types. I describe the word-frequency method that is standard in the field of computational-linguistics in BRRAM and earlier in this discussion. I explain why the manner in which it has previously been applied is erroneous in both as well. I did test these "accepted" methods myself and also asked one of the researchers in this field to run a set of texts to see how the raw data would differ; this computational-linguist could not generate results that matched most of the byline-matching texts to each other, whereas when I applied my 27-tests method all of these byline-matching texts accurately showed they were like each other in this particular group of texts we were testing. There are many similar byline-matches across the 284 texts in my Renaissance corpus.

I spent a year studying full-time in the first Synagogue to open in Moscow after religion was finally allowed. Then, I spent another year studying full-time in the Chabad-Lubavitch synagogue, during the period that paralleled Rabbi Menachem Mendel Schneerson's death in 1594; he died without leaving an inheritor of his Messiah-like position. (I was also briefly in a few different cults across my youth.) I attended regular services across my college years, before becoming an atheist, and studied different theological texts. Can you clarify why I am not theological enough for you?

As an atheist, I find strict adherence to any fictional doctrine to be irrelevant. In "Restitution", my goal is to determine what theological fictions Verstegan added that have since become part of the accepted Abrahamic theological narrative, assumed by most to be from the Scriptures, and not from a later scholar/fiction-writer.

You already gave the "616"/"666" error to KJB. If you refuse to acknowledge an error is an error when it comes to the Bible, it is pointless to engage in a debate on anything related to the Bible.

764prosfilaes
Mar 27, 2022, 9:58 pm

>763 faktorovich: I spent a year studying full-time in the first Synagogue ... Can you clarify why I am not theological enough for you?

Why would you think that any amount of time studying in a synagogue would make you familiar with the practices of Protestantism?

my goal is to determine what theological fictions Verstegan added that have since become part of the accepted Abrahamic theological narrative, assumed by most to be from the Scriptures,

Why would assume that there would be any? Why would the Roman Catholics and Eastern Orthodox groups care what's written by some heretic in England of all places? Much less the Baha'i and Muslims?

You already gave the "616"/"666" error to KJB. If you refuse to acknowledge an error is an error when it comes to the Bible, it is pointless to engage in a debate on anything related to the Bible.

The 616 appears in various manuscripts that weren't available to the translators of the KJV. Even today, not mentioning it in a modern translation wouldn't be a clear error; the translator picks the clear text they're translating, and both 616 and 666 have arguments for them. The KJV translators translated the Textus Receptus and errors in the translation should be measured against that.

765faktorovich
Mar 28, 2022, 12:54 pm

>764 prosfilaes: I have no idea why you objected that I am insufficiently "theological". I responded to your question by disclosing my theological background. I have been studying the English Protestant Church across the past 18 years, from the time I started studying Comparative Literature that included comparing the British to Russian literature (both of these literatures cannot be separated from the Christian churches that dominate their cultures). And I have certainly been closely studying Protestantism across the past 3 years of researching the British Renaissance.

Most religious people have not read the founding texts such as the nearly a million-words long Bible, and instead base their faith on abstract imagery of God and his deeds, including the outcome of the building of the Tower of Babel. The enormous size of Babel is more captivating than a minor point regarding morality or what is kosher to eat in the Bible. The "Saxon" propaganda Verstegan started contributed to later racist theorists' agenda, including the concept of the superiority of the German race in WWII. The current anti-Muslim racism is causing the West to support wars on Muslim countries that kill millions without achieving any morally positive outcome. An enormous volume of these currently seemingly intuitively known concepts were developed by Verstegan at the dawn of print. If you don't care about any of this, what on earth can you care about?

"Restitution" has never been translated before, and only a few scholars have noticed only a tiny portion of the frauds/ falsehoods Verstegan introduced in this text that have become history. Thus the "arguments" about the specific falsehoods have never been detailed in the type of annotations I am adding across this impactful text. I am not translating the Bible itself because, as you say, there have been plenty of scholars who have discussed this worshipped text.

766prosfilaes
Mar 28, 2022, 8:47 pm

>765 faktorovich: The current anti-Muslim racism is causing the West to support wars on Muslim countries that kill millions without achieving any morally positive outcome.

This, of course, has nothing to do with the Crusades or the Ottoman Empire or the Reconquista or American Protestantism or general European imperialism combined with oil combined with 9/11.

The "Saxon" propaganda Verstegan started contributed to later racist theorists' agenda, including the concept of the superiority of the German race in WWII.

Not that anyone else has noticed that. It could be that an obscure Anglo-Dutch author was key to the Nazi's propaganda, or it could be that the Nazis built on a long history of antisemitism unbased in anything English and called upon whatever theories and authors supported them, with little critical analysis.

An enormous volume of these currently seemingly intuitively known concepts were developed by Verstegan at the dawn of print. If you don't care about any of this, what on earth can you care about? "Restitution" has never been translated before,

So a work that HathiTrust has no record of being reprinted in any form between 1634 and 1976 is nonetheless the key origin of many major items in modern history. Color me skeptical. You certainly haven't offered any evidence.

>717 faktorovich: Verstegan ghostwrote pretty much all of the English sermons, and most of the theological Protestant and Catholic pamphlets from around a century of British history,

Right; a guy who history records as dropping out of Oxford because he refused to swear an anti-Catholic oath and was then driven out of England in 1681, but in your history he wrote everything for everyone.

767faktorovich
Mar 29, 2022, 2:04 pm

>766 prosfilaes: Elizabeth I made a deal with the Ottoman Empire after she was excommunicated from the Catholic Church, and this deal was retracted after James I took the throne just before "Restitution" was published in 1605. Verstegan had been exiled from England since 1581, and he had become increasingly angry about how he was treated by all sides, and he clearly demanded "restitution for decayed intelligence" of folks who had been hiring him as their ghostwriter/propagandist, like Elizabeth, James, Spain, Pope and the rest. I explain how all of these are connected across BRRAM, and I specifically focus on explaining pro-German/Dutch/"Saxon" racism that Verstegan established with "Restitution" in this work. He had found a place for his press in the German/Dutch region, whereas he had not found either England or Rome to be welcoming to his attempts to profit from publishing. Verstegan ghostwrote "Raleigh's" posthumous "prison" manuscripts, and he ghostwrote other travelogues of early colonialism that made up a lot of fiction, as Verstegan had not himself gone on all of these voyages he was describing. You really have not read any part of BRRAM, as I explain each of these points in depth in different sections/ volumes.

You can go on imagining what inspired the Nazis or other racists, or you can read my "Restitution" translation when it is finished, and you will find out the origin of these ideas at the dawn of print. Demonizing other religions and ethnicities as barbaric and inferior was cemented in the Renaissance, and by ignoring these roots, humanity is repeating these Medieval biases as if they are not propagandistically and artificially built constructs.

I pointed to the single annotation regarding the size of Babel I made that explained just one phrase - this measure never appeared before - and after "Restitution" it was quoted by many theologists, and frequently without a reference to Verstegan. There are similar pieces of evidence in nearly every phrase of "Restitution", and really all the other volumes in BRRAM.

Verstegan being kicked out of Oxford, and then exiled from England created an extraordinarily deep-seated rage that made him unsympathetic as other printers/ writers/ priests/ "witches" were executed over the things he was writing. He appears to have returned to England by 1594 or so under a pseudonym, and he had to keep using pseudonyms across his career because his exile was never revoked (even as one of his later books was published by James official King's Printer). If you are having trouble believing fragments you are reading about in a chat, don't you think reading the books is the first step to understanding what I am referring to? Why would anybody begin believing in any new findings without reading the findings themselves?

768prosfilaes
Mar 29, 2022, 10:00 pm

>767 faktorovich: You can go on imagining what inspired the Nazis or other racists, or you can read my "Restitution" translation when it is finished, and you will find out the origin of these ideas at the dawn of print.

I can go on reading people who have studied the Nazis or the Neo-Nazis or other forms of modern racism, or I could read a scholar of Renaissance English who tells me that a Renaissance English work explains it all.

It's just so conspiracy theory, this desire to boil everything down to one work known primarily to you.

I pointed to the single annotation regarding the size of Babel I made that explained just one phrase - this measure never appeared before - and after "Restitution" it was quoted by many theologists, and frequently without a reference to Verstegan.

The Tower of Babel is Biblically enormous; who cares if one estimate is quoted?
Glancing at HathiTrust, It appears a few times, but only a few. Pieter Bruegel the Elder painted the Tower forty years before Restitution was written, and according to HathiTrust, it's been referenced way more than the 5,164 paces measurement.

He appears to have returned to England by 1594 or so under a pseudonym

History records that he lived in the Netherlands between 1585 and 1640, and published works in the Nieuwe Tijdinghen printed in Antwerp in that period. But if we assume that's he's a ghostwriter over that period, then we have to change history.

Why would anybody begin believing in any new findings without reading the findings themselves?

I can't begin to read the vast range of research out there. So I listen to experts, and if I'm skeptical, I look for other experts. I apply what I know, but I try and know my limits; I'm an expert in programming languages and roleplaying games, but not most other fields. I'm skeptical and there doesn't seem to be anyone agreeing with you. It's amusing to argue with you, but there's other things I really should be doing, and things I'd much rather be reading.

769faktorovich
Mar 30, 2022, 9:03 am

>768 prosfilaes: Either you want to learn what has been going on in the past, or you do not. This thread is about who really wrote the British Renaissance? The answer is in my BRRAM series. I would have stopped at a single article that explains the computational-linguistic findings if this was the end of the mystery. But with 17 volumes in the series so far, there are many other mysteries that this original question has led me to that are of interest to humanity. Nobody can reach a conclusion regarding the veracity of scholarly findings without reading these findings. Yes, my "Restitution" volume will address racism as pertains to the myth of the grandeur of the Germanic tribes, of which Old English and Old England has been proposed to have been a member; the cross-migration of populations across the world (which was described by ancient historians such as Strabo’s "Geography", which describes Britain’s inhabitants as mostly following the traditions of their Celtic neighbors and being similarly “barbaric”, as they are not even practicing agriculture; the section on the Germanic tribes explains that they are mingled with Celtic, Thracian and various other tribes are “mingled with these”) and the wildly different maps over the centuries makes this into an absurd claim. There is nothing unbelievable in what I am saying, I am merely presenting overwhelming evidence to support alternative versions of history to explain the irrationality of most problems that are causing wars/strife. You refusing to read my research, while continuing to object for the sake of objecting is a conspiracy theory; me carrying on with my research while giving away over 20,000 free review copies of the series is a researched theory.

"Pieter Bruegel the Elder painted the Tower": I counted 20 floors in Bruegel's Tower. 5,164 paces = 12,910 feet high; in comparison the current highest building, Burj Khalifa, is 2,717 feet, and it has 163 floors; so Babel would have had around 775 floors instead of 20 floors. My article is forthcoming in "Critical Survey" that explains the extreme exaggeration of other hyperbolic statements Verstegan made such as the estimate that 10,000 people came to see a single Renaissance play in the “Thomas Nashe”-bylined and Verstegan-ghostwritten "Pierce Penniless His Supplication to the Devil" (1592), wherein he described a Strange’s Men’s performance of a Talbot play at the Rose to an audience of “ten thousand spectators at least (at several times)”. I estimate how many people could physically fit into a theater of Rose's size using its dimensions etc. What about my math is conspiratorial to you, vs. the conspiracy being in anybody believing these absurd numbers?

Verstegan also published works under his own byline with the King's Printer in London during those same years when he was supposed to be in exile. Changing the location of the publisher really just takes editing the title-page credit from London to Antwerp to WhoKnowsWhere. Verstegan's books were intercepted when his buyers attempted to cross borders into England with them (due to pro-Catholic subjects being outlawed there) were prosecuted in the 1590s, as he explains in his letters; so he had a motive to return or to print books that were to be sold in England in England to avoid having his customers serve time in prison. It's a complex explanation that I detail in "Restitution's" introduction.

Well, I have begun to read the vast volume of research out there, and it is in need of editing, and that's why I'm writing the BRRAM series. If you lack the time to review the interior of a book series, you probably shouldn't be repeatedly calling it an unfounded conspiracy theory. You can just go about doing something else more fun, or you can argue in a manner that is more respectful towards scholarship you have not read and thus have not properly evaluated it before libelously accusing it of irrationality.

770prosfilaes
Mar 30, 2022, 10:30 pm

>769 faktorovich: This thread is about who really wrote the British Renaissance? The answer is in my BRRAM series.

A very humble answer.

to support alternative versions of history to explain the irrationality of most problems that are causing wars/strife.

Which is an argument against your theory. Theories that claim to solve war and all the problems of humanity are inherently more questionable than theories that claim to clarify our understanding of one part of one subject. Every academic with a Ph.D. has produced a thesis that enhanced our understanding of some subject. Few produce revolutionary ideas, and most revolutionary ideas are wrong

I counted 20 floors in Bruegel's Tower. 5,164 paces = 12,910 feet high; in comparison the current highest building, Burj Khalifa, is 2,717 feet, and it has 163 floors; so Babel would have had around 775 floors instead of 20 floors.

Which is all irrelevant to the question of whether 5,164 paces is a standard figure for the height of the Tower of Babel (it's not) and whether Bruegel's Tower is more iconic (it seems to be) and whether it matters at all (why would it?)

Verstegan also published works under his own byline with the King's Printer in London ...

The point remains; the more established history you have to disregard for your theory to work, the more skeptical I have to be. If you say that JFK wrote articles for an academic journal under a pseudonym in 1960, that's possible; if you think he did in 1965, that's quite improbable.

771MrAndrew
Mar 31, 2022, 8:30 am

The 1965 articles were ghostwritten by Elvis, duh.

772faktorovich
Mar 31, 2022, 12:32 pm

>770 prosfilaes: It would be absurd to dismiss all theories that "solve war and all the problems of humanity" as "inherently" "wrong" because by doing so humanity would be dooming itself to continue having these problems when a solution might have been found in the proposed theory. Thus, before any theory that is positively "revolutionary" (i.e. causing complete dramatic positive change) is dismissed it has to be given rational consideration, especially if it is not merely a "theory" but is also accompanied by a practical attribution method, and other applicable evidence and solutions.

Here are just a few of the sources I found on a brisk search for citations of the "5,164 paces" measure that are impacting humanity's perception of the precision of Biblical concepts:

https://www.newworldencyclopedia.org/entry/Tower_of_Babel
https://en.wikipedia.org/wiki/Tower_of_Babel
https://books.google.com/books?id=-etUAAAAcAAJ&pg=PA23&lpg=PA23&dq=t...
https://www.google.com/books/edition/The_History_of_the_World_Or_an_Account_o/_u...
https://www.google.com/books/edition/A_Commentary_on_the_Old_and_New_Testamen/Gj...

"The more established history you have to disregard for your theory to work, the more skeptical I have to be." You and the rest of humanity should spike at maximum skepticism at all times. It is absolutely necessary to be skeptical about all things. The term means "not easily convinced; having doubts or reservations." Skepticism is the driving force behind my re-attribution of the Renaissance and its history. After the data told me the accepted historical narrative is false, my skepticism was spiked, and I carefully researched the evidence. When this BRRAM series is finished I will have presented this evidence (at least in 20 books/volumes) to humanity for their skeptical review.

Intuitively, I have a hard time believing JFK could have written an article of any type, and I would be more easily convinced that he hired a ghostwriter who continued writing after JFK's death if you found linguistic evidence of similarity between JFK's writing prior to his death and texts under other bylines written after his death. I would have to research JFK's school papers through every other piece he published/ wrote across his lifetime, as well as his credentials etc. to determine if my intuitive guess is right, or if it is indeed more likely that JFK did not die and lived on to become a professional ghostwriter (following your hypothesis). It is a good idea to think through such hypothetical questions. For example, Verstegan ghostwrote for "Raleigh" and others bylines after their deaths, so it is important to question if instead Raleigh had not been executed and had instead lived on to ghostwrite for other bylines. It is unlikely that Raleigh could have pulled off an escape, and there is no supporting evidence of surviving early-life writing for Raleigh in his handwriting; so this hypothesis is far less likely than Verstegan's proven long life and his work as a publisher/writer.

773prosfilaes
Mar 31, 2022, 11:41 pm

>772 faktorovich: It would be absurd to dismiss all theories that "solve war and all the problems of humanity" as "inherently" "wrong" because by doing so humanity would be dooming itself to continue having these problems when a solution might have been found in the proposed theory.

Ĉu vi estas ankaŭ Esperantisto? Ĉu vi havas revojn de fina venko? Or is this only a rule for you and your theories?

All the people who have promised to cure cancer have failed, many causing vast suffering in their wake. Hundreds and hundreds of other people have provided tools to blunt the impact of various types of cancer, bit by bit driving back the scourge of cancer in part by recognizing the breadth of the problem and the power and lack thereof of their solutions. Even if we assumed your claim about who wrote the works of the British Renaissance were correct, that's not going to change the real issues that cause war in the 21st century.

https://www.newworldencyclopedia.org/entry/Tower_of_Babel
https://en.wikipedia.org/wiki/Tower_of_Babel ...

You start with a copy of Wikipedia and Wikipedia itself, as if they were two separate works. Wikipedia mentions the number in a section called History:

"The Book of Genesis does not mention how tall the tower was. ... The Book of Jubilees mentions the tower's height as being 5,433 cubits and 2 palms, or 2,484 m (8,150 ft) ... The Third Apocalypse of Baruch mentions that the 'tower of strife' reached a height of 463 cubits, or 211.8 m (695 ft) ... Gregory of Tours writing c. 594, quotes the earlier historian Orosius (c. 417) as saying the tower was ... two hundred (91.5 m or 300 ft) high ... A typical medieval account is given by Giovanni Villani (1300): He relates that ... it was already 4,000 paces high ... The 14th-century traveler John Mandeville also included an account of the tower and reported that its height had been 64 furlongs, or 13 km (8 mi) ... The 17th-century historian Verstegan provides yet another figure – quoting Isidore, he says that the tower was 5,164 paces high, or 7.6 km (4.7 mi)..."

That demonstrates that it is listed as one source among many of the tower's height, and thus not at all "impacting humanity's perception of the precision of Biblical concepts"; you could in fact remove all mention of Verstegan and his measurement from the article and still leave much the same impression.

It's followed by a book from the 17th century, and two books from the 19th. That says nothing about what humans believe in the 21st century.

You and the rest of humanity should spike at maximum skepticism at all times. It is absolutely necessary to be skeptical about all things. ... After the data told me the accepted historical narrative is false, my skepticism was spiked,

That's not maximum skepticism. You interpreted the raw data to say the accepted historical narrative is false in the direction of your belief system (e.g. Intuitively, I have a hard time believing JFK could have written an article of any type,). That's when skepticism is most important and most often lost; when you draw the conclusions you want to hear and that you expect. You can't run the data through a pile of processes, extracting certain data and measuring it in certain ways, and act like you're doing a neutral process.

Verstegan's proven long life

Proven. Seriously? You take someone who is said to have lived from 1585 or 1586 to 1640 in Antwerp, dying there, and you contradict that whole statement and then call his lifespan "proven", as if it weren't supported by the same sources that tell you he was in the Netherlands all that time.

774faktorovich
Apr 1, 2022, 2:34 am

>773 prosfilaes: No, I do not speak Esperanto.

Yes, cancer can be cured if professors/ academics could not purchase grants/ fellowships to fund this research with bribes after paying for a paper degree and for a ghostwriter to write their research for them. And yes, if intelligence was required for political office and politicians could not hire ghostwriters to craft their speeches, no rational intelligent politician would reach the conclusion that war is a solution. By accurately finding ghostwriting, humanity will be spared from grave idiocy. If the people who could understand the research they were pitching in fellowship proposals actually won fellowships/ grants, they would actually make progress in curing cancer and the other diseases. Thus, anybody who is for maintaining society as it now is has doomed themselves to many ills that could have been prevented if intelligence had the reigns.

Good point. Verstegan was clearly satirizing these other absurd numbers for the size of Babel. His intention was not to be believable, but rather to subversively argue that any such precision added after the initial holy book was written would have been impossible.

"You interpreted the raw data to say the accepted historical narrative is false in the direction of your belief system". No. I board member of my PLJ journal asked me to check on the attribution of the Renaissance after I had researched the 18th century; I did not have any bias regarding who wrote the texts of the Renaissance until I ran the calculations and they gave me the answer. The historical narrative was proven to be false through my findings, both the computational-linguistic data, and the handwriting, documentary, confessional, and various other types of evidence. "(e.g. Intuitively, I have a hard time believing JFK could have written an article of any type,)." As usual, you are extracting a point, and ignoring the rest of my statement; I began with this intuitive hypothesis, before explaining that I would have to evaluate JFK's writing samples, biographical evidence etc. to reach an actual attribution conclusion, or to check if my intuitive sense was accurate or not. "That's when skepticism is most important and most often lost; when you draw the conclusions you want to hear and that you expect. You can't run the data through a pile of processes, extracting certain data and measuring it in certain ways, and act like you're doing a neutral process." Based on my research into other computational-linguists' methods, yes, all of them manipulate data to arrive at non-neutral results that match the current attributions, or only shift a few bylines between texts. My method is the only one that has arrived at an accurate re-attribution because it is indeed the only one that is unbiased and neutral.

The standard estimate is that Verstegan was born in around 1550. I did not say he was born in 1585. He lived in London in his youth; this is verified with documentary evidence, such as his attendance record in Oxford. Antwerp was part of Netherlands between 1815 and 1839.

775FAMeulstee
Apr 1, 2022, 8:04 am

>774 faktorovich: Antwerp was part of Netherlands between 1815 and 1839
Antwerp was part of the Netherlands UNTIL 1830 (or 1839*), when it became a city in Belgium. Before that time it was part of the Southern (Spanish/Austrian) Netherlands, since 1581, when it was separated from the Northern Netherlands by the Dutch Revolt.

* Belgium declared independency in 1830 and was recognised in 1839 with the Treaty of London. Most sources keep 1830 as the date.

776RebeccaJoyce
Apr 1, 2022, 8:08 am

Questo utente è stato eliminato perché considerato spam.

777anglemark
Modificato: Apr 1, 2022, 12:05 pm

>768 prosfilaes:
I could read a scholar of Renaissance English
That strains the definition of "scholar of Renaissance English". Post 422 by Petroglyph is a good summary of why.

I also question whether a researcher who does not know what a holograph manuscript is could be considered a handwriting expert (see https://anaphoraliterary.com/journals/plj/plj-excerpts/book-reviews-spring-2019/ , the review of Daniel Defoe: Master of Fictions: His Life and Works by the very highly regarded professor of literature Maximilian Novak; a review which ends on the following note: "Beyond blindly following past falsities, Novak has “added a number of works to the canon, from a holograph manuscript I found in the William Andrews Clark Memorial Library to pamphlets which to my mind reflect his style and thought. In each case I have published my reasons for believing these works were by Defoe, and I have not changed my mind about them.” He ends the argument there and moves on to a different point in the next section. This conclusion proves Novak’s central bias: he cannot dismantle an industry he helped to build. “Holograph”? The dictionaries I consulted indicate this is not a real world. Perhaps he meant to say “hologram”: but why would 18th century texts be in 3D? More likely it is a typo for “monograph”, but still “no”, as Defoe did not write scholarly books. Adding a 3D modern image or a scholarly monograph would be appropriately absurd for the rest of this discussion, but definitely would counter any reasonable boundaries of scholarship."

You couldn't make this stuff up.

(edited to add: this is Linnéa.)

778lilithcat
Apr 1, 2022, 10:00 am

>777 anglemark:

“Holograph”? The dictionaries I consulted indicate this is not a real world {sic}.

One must wonder which dictionaries were consulted.

OED: "Of a deed, letter, or document: Wholly written by the person in whose name it appears."
Merriam-Webster: ": a document wholly in the handwriting of its author"
Collins: "written entirely in the handwriting of the person under whose name it appears "

Heck, even Wikipedia knows what "holograph" means: https://en.wikipedia.org/wiki/Autograph_(manuscript)

779Keeline
Apr 1, 2022, 10:04 am

>777 anglemark: Wow. Just wow. One cannot spend a day in the field of handwriting studies without encountering that field’s earlier specialized definition of “holograph.”

James

780faktorovich
Apr 1, 2022, 12:17 pm

>775 FAMeulstee: Thanks for clarifying this point.

781faktorovich
Apr 1, 2022, 12:29 pm

>777 anglemark: As you can see from my explanation, I looked up the definition of "holograph" and for some reason this term did not come up in the dictionaries I checked, so I checked similar words like "monograph" and "hologram". I do not know what might have gone awry that kept me from finding the definition for "holograph" that I found in Meriam-Webster etc. when I searched for this term today. It is still true that to say "holograph manuscript" is redundant as a "holograph" is "a document wholly in the handwriting of its author", so simply stating "holograph" is sufficient without repeating it is a "handwritten manuscript manuscript". It is amazing that you have gone out of your way to read through over 600 of my reviews to find this single glitch. And you think a single error like this proves that I am not a "handwriting expert"? One does not need to be a handwriting expert to be able to surmise the similarity between the handwritings of the different bylines in the groups I form in the handwriting comparison file.

782faktorovich
Apr 1, 2022, 12:44 pm

>779 Keeline: No, the term "holograph" is not used in all textbooks on this subject. As an experiment, I searched on Google Books on textbooks about handwriting analysis and out of the six I checked, only one mentioned the term "holograph". These books didn't:

https://www.google.com/books/edition/Handwriting_Analysis/g3jISguoksoC?hl=en&amp...
https://www.google.com/books/edition/Handwriting_Analysis_Plain_Simple/MhR_DwAAQ...
https://www.google.com/books/edition/Handwriting_Analysis/UTR8DwAAQBAJ?hl=en&amp...
https://www.google.com/books/edition/Graphology/m1sRBAAAQBAJ?hl=en&gbpv=1&am...
https://www.google.com/books/edition/Sex_Lies_and_Handwriting/kUNzGomsKI4C?hl=en...

And this last one only mentioned it once in a diagram description. The rest of that figure title is: “Plate 111. Carefully dotted i’s and flat r-tops say the same things when written with toes as when written with the hand. Billy Richard’s holograph”:

https://www.google.com/books/edition/Handwriting_Analysis/B8vn-03mmGsC?hl=en&amp...

Thus, only somebody who is unfamiliar with this field would think "holograph" is typically used in it, as the standard is to just refer to the "handwriting", as if a piece includes handwriting, the analyst is referring to it. In other words, a book called "Holograph Analysis" would not attract readers because it would confuse them as to the topic due to the unfamiliarity of the term "holograph".

For example, there are unsurprisingly 95 mentions of "handwriting" in: https://www.google.com/books/edition/Sex_Lies_and_Handwriting/kUNzGomsKI4C?hl=en...

783paradoxosalpha
Modificato: Apr 1, 2022, 1:12 pm

Wow. Digging in again.

"Holograph" and "handwriting" are not interchangeable terms, as the definition quoted by >778 lilithcat: shows.

The term "holograph" used to qualify documents is not obscure or marginal.

784lilithcat
Apr 1, 2022, 1:23 pm

>781 faktorovich:

holograph manuscript" is redundant

"Holograph" is both a noun and an adjective. So, no, it's not necessarily redundant. It's a term used all the time by libraries and book dealers. One might say, "The library owns both a holograph manuscript and a typewritten manuscript of that work."

785anglemark
Modificato: Apr 1, 2022, 2:20 pm

>781 faktorovich: It is amazing that you have gone out of your way to read through over 600 of my reviews to find this single glitch.

Three incorrect assertions. I have not read 600 of your reviews, it is not a "glitch", and there are multiple errors in the short extract I quoted.

There are a few different ways that people might react when they learn something new, or find out that they have misunderstood something in the past. Trying to prove that nobody actually could be expected to know the thing, presenting as "evidence" half a dozen books about subjects that are only distantly related to the actual topic, is one kind of reaction, though not the one I'd recommend.

-Linnéa

786paradoxosalpha
Modificato: Apr 1, 2022, 2:28 pm

>784 lilithcat:

While I'll grant that it sees use, "typewritten manuscript" is a bit of an oxymoron, and "typescript" would suffice. "Holograph manuscript" is just unimpeachable usage, though!

787lilithcat
Apr 1, 2022, 2:35 pm

>786 paradoxosalpha:

"typewritten manuscript" is a bit of an oxymoron

In some contexts, yes. But in publishing, "manuscript" can simply mean the text submitted for publication. (Interestingly, at Wikipedia - granted not the most reliable of sources - "typescript" redirects to the "manuscript" page.)

788faktorovich
Apr 1, 2022, 8:54 pm

>784 lilithcat: Why are you citing a hypothetical? Have you not been able to find a library that refers to a "holograph manuscript" specifically, using the redundant form? "Holograph" being an adjective does not change its redundancy by the standard definition; for example, "hairy" means "covered with hair"; so to say "hairy hair" would be very redundant.

789faktorovich
Apr 1, 2022, 9:02 pm

>785 anglemark: You have taken the quote from my review out of context; you have not used proper single/double quotations around it to make it clear; and I think you introduced some typos of your own. If there were other errors in the passage, you should just quote the full review, and include full explanations for what other errors you have found, so I can clarify any potential misunderstandings.

The series includes translations of focal texts that are essential to establishing the patterns of these ghostwriters' styles. These translations are accompanied by extensive introductions that explain their self-attributed letters, handwriting samples (in comparison with their pseudonyms' handwritings), full reviews of past scholarship that mentioned these texts, as well as annotations that explain linguistic/ citation echoes between each text and other Renaissance texts. Volumes 1-2 explain not only the computational-method, but also offer biographic, historical, fiscal and various other types of very "related" subjects. Please clarify what you are trying to say.

790lilithcat
Apr 1, 2022, 9:04 pm

>788 faktorovich:

Yale University Library good enough for you? https://collections.library.yale.edu/catalog/2001528

How about the New York Public Library? https://digitalcollections.nypl.org/items/510d47df-fc5b-a3d9-e040-e00a18064a99

Mary Couts Burnett Library at TCU: https://repository.tcu.edu/handle/116099117/6131

The Smithsonian: https://www.aaa.si.edu/collections/louis-m-eilshemius-holograph-manuscript-and-d...

Rutgers: https://archives.libraries.rutgers.edu/repositories/6/resources/260

University of Virginia: https://archives.lib.virginia.edu/repositories/3/resources/342

The Bodleian: https://archives.bodleian.ox.ac.uk/repositories/2/resources/9974

Colgate University library: https://archives.colgate.edu/repositories/2/archival_objects/86999

791faktorovich
Apr 1, 2022, 9:11 pm

>787 lilithcat: The standard term used in the archives for a "typewritten manuscript" is "transcript" when referring to a typed document that has altered the medium of the preceding work (audio, handwritten etc.); for example, taking a handwritten work and typing it up to make it more accessible. In the Renaissance, simply typesetting a text and printing a single copy of it was considered a publication or a book (many books were only printed as a single copy). The distinction between a transcript and a book is thus clear, whereas if one uses "typewritten manuscript", readers will be confused if it is some sort of a strange specification that distinguishes it as not-a-book-but-typeset.

792faktorovich
Apr 1, 2022, 9:15 pm

>790 lilithcat: https://repository.tcu.edu/handle/116099117/6131 stands out because it is a combination of typed and handwritten pages; other examples also include typed and handwritten text. In every case, where the referred to manuscript is a "holograph manuscript", there should not also be typed text, or the description should have specified: "a holographic and typed manuscript".

793lilithcat
Modificato: Apr 1, 2022, 9:19 pm

>791 faktorovich:

You are confusing "transcript" and "typescript".

794Stevil2001
Apr 1, 2022, 9:25 pm

For a person who had never heard the word "holograph" before today you sure do have a lot of opinions on its usage...

Anyway, this thread is the gift that keeps on giving. Every time I think it's dead, it surprises me again. Happy April Fools' Day, everyone!

795faktorovich
Apr 2, 2022, 11:41 am

>793 lilithcat: No, I am not confusing them. I explain that a "transcript" is derived from an earlier version, such as a handwritten text or audio recording. I also explained that a typed script or a typeset document that had been printed was equivalent to a "book" in the Renaissance (as there was no distinction between private printing with a home printer and publishing with a commercial printer). Thus a "typescript" was equivalent to a "book". The term "typescript" was never used across the Renaissance, so it is not relevant for describing documents from that period.

796faktorovich
Apr 2, 2022, 11:42 am

>794 Stevil2001: A researcher studies whatever term, concept or problem is presented, and comes up with evidence that supports opinions. There is nothing surprising about this basic process.

797spiphany
Apr 2, 2022, 2:37 pm

>796 faktorovich: "A researcher ... comes up with evidence that supports opinions"

Well, that certainly explains a lot.
I'll remember to tell the scholars I work with that they've gotten the workflow backwards all this time.

798anglemark
Apr 2, 2022, 3:40 pm

>789 faktorovich: Thing is, other people can read, and can see for themselves that the quote from your review is attributed to you with quotation marks and all. And they can easily check that the text, which was copied and pasted from your website, is unaltered (it's a matter of 30 seconds to check that) and that there is no missing context.

What you have been getting here for months now are actual peer reviews and expert critiques of your methodology, your writing, and your various claims of expertise – the kind of thing you miss out on when you publish your own books and articles independently. It is a pretty amazing group of experts who frequent these forums. But it doesn't look like you have taken advantage of that at all, since when you are called out on an error, your reaction is still that the reader must have misunderstood something. Or else you simply ignore the comments, and later claim that your methods are robust and your results are conclusive and true beyond reasonable doubt, as if the thread didn't contain thousands of words explaining in detail some of the fundamental flaws. I'm starting to succumb to Brandolini's law here, as better scholars than I have done before me.

-Linnéa

799Keeline
Apr 2, 2022, 6:13 pm

A general purpose dictionary is not helpful when dealing with a specialized topic that uses its own connotations for words that may be used in a more general sense.

The ILAB.org site has a copy of John Carter's ABC for Book Collectors (8th edition) and it has many useful definitions used in the field of book collecting and study. It has this to say about "holograph":

HOLOGRAPH
Adjective (not noun): meaning entirely in the handwriting of the author, and designed to distinguish documents wholly thus written from those to which the author only appended his signature or autograph. It is commonly used of substantial documents, such as the complete text of a literary or other work, as distinct from autograph letters, annotations, inscriptions, etc.

I cannot agree with the notion that a word used in the modern study of books should not be used about literature of past centuries because that usage was not in use then. Words with specialized definitions in a field of study, sometimes called "jargon," are designed to convey more complex explanations or descriptions among people working in that field.

In this phrase:

Thus, anybody who is for maintaining society as it now is has doomed themselves to many ills that could have been prevented if intelligence had the reigns.

I suspect you meant to use its homonym, "reins" as in the leashes to control domesticated animals, and not the period or scope of a ruler such as a king or queen or president.

I think the OED and Merriam-Webster Collegiate dictionaries are fine resources. However, in the past half century or more dictionaries have become repositories to show how words are used or even misused in general practice. They are now more "descriptive" than "prescriptive" to say what a definition should be. This change muddles communication. In a world where second languages and voice-to-text systems are in use, you can have a lot of wrong usage, even in edited and published forms.

James

800Crypto-Willobie
Apr 2, 2022, 6:17 pm

800!

801faktorovich
Apr 2, 2022, 8:35 pm

>797 spiphany: Opinions can come before or after the evidence. There is no wrong order between these. The demands of a research question necessitate which of these should come first. For example, if somebody proposes an opinion and asks me if I agree with it, I have formed an opinion instinctively just by being asked. Or if I begin an experiment not having an opinion on the outcome, the evidence might point to a conclusion that forms an opinion.

802faktorovich
Apr 2, 2022, 8:44 pm

>798 anglemark: As I explained the copier and paster has introduced the error of double-quotations around the quotes within quotes and around the quote as a whole. If they did not want to change the interior double-quotations, they should have changed the exterior quotes to the British single-quotation marks. It's not a matter of being able to read the letters, but rather being able to distinguish what's being quoted vs. what I am saying about it. And yes, the rest of the review is the "context" that is missing from the quote, if you are saying there are some other errors in this review that you have not specified. You are clearly confused by what I am saying in the excerpt, so you would logically want to read the full review to orient yourself.

I have not learned anything new from this discussion that is relevant to my research. I have read and replied to all points raised by countering how they are wrong. It might indeed be more difficult to counter the falsehoods you guys have been stating, than simply to present my true scientific findings, but as I explained I am going to keep answering your questions to help anybody who comes to this discussion with a sincere desire to understand my findings and the true version of British Renaissance history.

803faktorovich
Apr 2, 2022, 8:58 pm

>799 Keeline: LEME is a collection of dictionaries created across the Renaissance. It frequently includes contradictory or very different definitions for the same word in the different dictionaries. It would be absurd for any serious linguist to claim that all but one of any given sets of definitions is "wrong". Such a quality judgement would betray the researcher's bias, as the belief of the dictionary-writer about a given word might reflect truer current spoken patterns in the Renaissance, than we can derive centuries later. Thus, a modern general-purpose dictionary is just as good as a specialized dictionary, as most users will check a general dictionary when encountering a word they do not know such as "holograph". While there is clearly some "jargon" words that have to be applied in modern scholarship that were not used in the Renaissance, I had explained that the specific term "holograph" is not necessary in the study of the Renaissance because all printed texts are simply books, and the rest of the manuscripts are handwritten.

Yes, I meant "reigns": "the period during which someone or something is predominant or preeminent". I meant to say that if intelligent people were predominant, they would fix society's ills.

There are plenty of "errors" or irrational definitions/ spellings etc. in all dictionaries. Why and how these errors start and develop is one of the questions I am asking as I translate one of the early dictionaries, or Verstegan's "Restitution".

804MrAndrew
Apr 3, 2022, 6:57 am

Sorry, this minor point is just going to continue to bug me unless i say it. So, the statement:
"Thus, anybody who is for maintaining society as it now is has doomed themselves to many ills that could have been prevented if intelligence had the reigns."

Would have been better phrased:
"Thus, anybody who is for maintaining society as it now is has doomed themselves to many ills that could have been prevented if intelligence reigned."

Would you agree?

805faktorovich
Apr 3, 2022, 9:26 am

>804 MrAndrew: This is a note I wrote yesterday for Verstegan's "Restitution":

Cimbrica Chersonesus In Modern English, Chersonesus typically refers to a Greek colony in the Crimean Peninsula. However, on one of Ptolemy’s map, he labeled as the Chersonese Aurea (Latin/Ancient Greek: Golden Peninsula) a region that is now called Malay Peninsula (including parts of Malaysia and Thailand). The Cimbrian language is a variant of Upper German in a region of Italy near Verona; in the following paragraph, Verstegan questions if Tacitus might have mislabeled the Saxons as Cimbrians. The intended reference among Ptolemy’s maps is to “Cimbricae Chersonesi” as a single term, and not as two different locations; it means Cimbrian Peninsula, which is located by the Alpes Mountains and Gallia Belgica (a province in the Roman Empire around France, Belgium, Netherlands and Germany); though the latitude 56’ and longitude 32’ that Ptolemy gives for this region places it closer to Moscow than to France. The meaning Verstegan and Ptolemy intended is closest in the modern term Cimbrian Peninsula, which is now equated with Jutland, which includes parts of Denmark and Germany.

Now as for your question, it is possible that Intelligence needs to take several positions of power at the same time, hence the plural "reigns". And my point is that many intelligent people should take power in their fields, and not that there should be a tyranny by a single intelligent Person who reigns over all. If you are struggling editoritis, feel free to edit the above annotation; this Cimbrian Peninsula is where Verstegan argued the Anglo/English-Saxon people originated from. He made up the term in this passage: "These Wood-Saxons, having before only been called Saxons, now (as it seems) were for distinction called English-Saxons, a name perhaps abbreviated from Englandish-Saxons by reason of that part or province of the Cimbrian Peninsula, called England, wherein they inhabited." And the term "Anglo-Saxon" is still the standard term for describing "Old English" and "Germanic Englishmen".

806susanbooks
Modificato: Apr 3, 2022, 10:28 am

>798 anglemark: Thank you for introducing me to Brandolini's Law. It's the very reason I've gone silent here. Every once in a while I marvel that this conversation is still going on, look in on it, start typing an exasperated reply, and delete the whole thing with a sigh & rolled eyes.

Faktorovich, you do yourself no favors in debating basic terms like "holograph." Did grad school teach you nothing? Would your professors have let you get away with this in class?

To quote >798 anglemark:, "What you have been getting here for months now are actual peer reviews and expert critiques of your methodology, your writing, and your various claims of expertise – the kind of thing you miss out on when you publish your own books and articles independently. It is a pretty amazing group of experts who frequent these forums. But it doesn't look like you have taken advantage of that at all."

At the very least, rather than arguing every single point, down to basic definitions (on which you are embarrassingly wrong), recognize that your interlocutors are indeed experts & their points deserve consideration rather than absurd knee-jerk defensiveness. You are not behaving like a scholar.

I wrote an essay on Dante a while ago & when I got it back from the editors, there was a query about something really basic, something so stupid. I was embarrassed beyond belief that I wrote it at all. It was sloppy writing & worse thinking. So I doubled down & wrote five paragraphs defending it. No. Actually, I took it out & -- shazzam -- everything around it was tighter, more sensible and coherent. I was thankful for my editors' advice. That's how scholarship works.

807faktorovich
Apr 3, 2022, 9:48 pm

>806 susanbooks: Would my professors have let me "get away" with debating "terms" in class? Now that you mention it, I had to keep my hand up across most of a class before a biased, chauvinist professor would call on a "talkative" woman like me. But I have been teaching college across the past decade, and have not been in a classroom since 2010. I absolutely would encourage my students to debate and question all terms. It would be especially necessary for a student to defend the definition of such a term if all of the other students in the glass ganged up to insult a given students understanding of the term. I would be assisting bullies if I allowed them to attack the students, and forbade the student from defending him or herself.

As I have said multiple times now without you understanding me is that I have previously published my computational-linguistic/attribution research in journals such as "Journal of Information Ethics" and other articles are forthcoming in established journals such as "Critical Survey". I do not want or need any "review", especially since nobody here has offered a single word that is relevant to BRRAM that I had not previously come across through my own research. I am here to assist you with understanding my research, as you clearly have not been trying to achieve, as only one of you has even glanced inside any of BRRAM's volumes.

My "interlocutors" are almost all using pseudonyms or fake names, and are not disclosing their credentials or biases, thus you are technically not "scholars", until you prove this to be the case.

I just submitted the final version of my "Critical Survey" essay, and the editor only made light proofreading adjustments to change some spellings to the British variants. I have not had any major changes suggested by my book editors (the two books I published with McFarland) or article editors since around 2005 when I started publishing my scholarship. I certainly have never had an editor who accepted my work suggest that I remove five paragraphs from any of my projects, as I never digress from the points that are essential for proving the argument. You have either had incredibly horrid editors, or you are an incredibly horrid writer and should not be advising others on how scholarship works.

808Stevil2001
Apr 3, 2022, 9:53 pm

I have not had any major changes suggested by my book editors (the two books I published with McFarland) or article editors since around 2005 when I started publishing my scholarship. I certainly have never had an editor who accepted my work suggest that I remove five paragraphs from any of my projects, as I never digress from the points that are essential for proving the argument. You have either had incredibly horrid editors, or you are an incredibly horrid writer and should not be advising others on how scholarship works.

This is so wrongheaded that I am completely boggled, but it is par for the course in this conversation.

809MrAndrew
Apr 4, 2022, 6:02 am

>805 faktorovich: So, that would be "no", then.

810susanbooks
Apr 4, 2022, 9:40 am

>807 faktorovich: The five paragraphs was a joke at your expense, dear, as you do go on & on defending defenseless points. I've never been asked to remove 5 paragraphs from anything I've written. Careful reading is also part of scholarship.

811faktorovich
Modificato: Apr 4, 2022, 1:25 pm

>809 MrAndrew: I had to give a specific answer to your question because you were picking apart a single word without getting anywhere, while my current research is attempting to decipher the roots of a dictionary full of words. If you are searching for a linguistic challenge, you could take me up on attempting to dig further. I just found a new citation of my "Rebellion as Genre" book in a Cambridge University Press title, "The 1857 Indian Uprising and the Politics of Commemoration" (2022): https://books.google.com/books?id=Cy9lEAAAQBAJ&vq=faktorovich&lr=&so...

812Keeline
Apr 4, 2022, 10:12 pm

>807 faktorovich:

two books I published with McFarland

I've had my own connections with the editors and products of McFarland. I even considered using them for one of my works. They are generally a bit hands-off with regard to content. They seem to be more interested in meeting economical length concerns.

This means that the quality of the books is largely up to the author. I've seen examples which are very good and others which are sadly not up to academic standards. They are not what most would call a rigorous academic publisher.

I see that the Journal of Information Ethics is also published by McFarland (since 1992). It is not in my scope of interest so I was unfamiliar with it until it was mentioned.

James

813faktorovich
Modificato: Apr 5, 2022, 1:24 pm

>812 Keeline: "Critical Survey" and the rest of my publishers are not with McFarland. McFarland is on various publisher ranking lists such as: https://www.libraryofsocialscience.com/newsletter/posts/2014/2014-10-20-NUPs.htm.... With 14 citations of my "Formulas" book with McFarland, I am in the top 20% or so of published researchers.

"First, almost 44% of all published manuscripts are never cited. If you have even 1 citations for a manuscript you are already (almost!) in the top half (top 55.8%). With 10 or more citations, your work is now in the top 24% of the most cited work worldwide; this increased to the top 1.8% as you reach 100 or more citations." https://lucbeaulieu.com/2015/11/19/how-many-citations-are-actually-a-lot-of-cita...

Additionally, "Faculty at history departments in doctoral and research universities had at least one book at tenure (the average was 1.12 books at tenure). Among faculty at baccalaureate- and master's-degree granting institutions only about 3 in every 4 faculty had published a book." https://www.historians.org/publications-and-directories/perspectives-on-history/.... Thus, the fact that I have published two scholarly books with McFarland (even if you do not count the few other books with other publishers and over 50 of my own books with Anaphora) puts me in the top percentile even among tenured faculty in research universities.

What you feel intuitively about a publisher like McFarland is irrelevant; the citation-count is how publishers are quantitatively evaluated; and if Cambridge University Press authors are citing my McFarland books, they do not see them as in any way inferior. The significance is in the research, and making research accessible to the world. You have to just read my series to evaluate it. By continuing to refuse to read BRRAM, you are just spinning in spiteful loathing that is entirely unproductive.

814clamairy
Modificato: Apr 5, 2022, 1:08 pm

>813 faktorovich: "By continuing to refuse to read BRRAM, you are just spinning in spiteful loathing that is entirely unproductive."

This statement sums up in a nutshell why so many of us are unwilling to take you seriously. You can't take any kind of constructive criticism or make any kind of rebuttal without adding a nasty or snarky comment. What kind of scholarship is this?

815faktorovich
Apr 5, 2022, 9:33 pm

>814 clamairy: All of you have a tendency to project whatever is true about yourself onto whoever you are arguing with. In this instance you have taken a true observation I have made regarding your continuing spiteful attacks on errors you guys are imagining with my character, and have attempted simply to echo back to me that I am guilty of such spite. Since almost all of you are using pseudonyms, I could not make a personal "nasty" attack on your characters, as I do not have access to your resumes, as you have access to mine. And in your concluding question you seem to indicate that you have no idea what the field of my scholarship is, having just arrived, and deciding to just project the message anyway.

816prosfilaes
Apr 5, 2022, 11:01 pm

>813 faktorovich: the citation-count is how publishers are quantitatively evaluated;

This goes to the heart of the discussion. Citation count is one way, not the way, publishers are quantitatively evaluated. And it, like every quantitative measure, had flaws and limits, which is why there's multiple options.

817faktorovich
Apr 6, 2022, 12:38 am

>816 prosfilaes: Yes, another among these options is the top-publishers ranking list(s), which I also cited for McFarland. Perhaps you should focus on the meaning of this discussion instead of on its "heart"?

818clamairy
Apr 6, 2022, 12:23 pm

>815 faktorovich: Do you actually believe your tone has no impact on reader's attitude towards you and your work? If so, perhaps you'd better re-read this post: https://www.librarything.com/topic/337240#7763925

819faktorovich
Apr 6, 2022, 8:53 pm

>818 clamairy: "This rudeness is a sauce to his good wit,/ Which gives men stomach to digest his words/ With better appetite." -"Julius Caesar", "Shakespeare"/Jonson

If all of you guys were devoid of rudeness, I would perhaps be able to avoid this witty sauce; but as it stands, a rude tone will have to coat our stomachs. I would very much prefer if all tone (rude and polite) was subtracted from this discussion, and we just engaged in a rational, unemotional argument about the evidence and the conclusions.

820Keeline
Modificato: Apr 8, 2022, 12:20 am

>813 faktorovich:

For about a decade the Edward Stratemeyer entry on Wikipedia had a seriously flawed statement and this relates to McFarland as an academic publisher and the lack of fact checking and editorial rigor of their publications.

The entry claimed that Gilbert Patten, the author best known for the Frank Merriwell stories (though not all of them) hired Edward Stratemeyer at Street & Smith in the early 1890s. This came from a survey of sports fiction by Dinan.

However, this is factually backwards. It was Stratemeyer, who spent part of each week as an associate editor for Street & Smith who bought stories from Patten at the beginnings of the latter's career with that publisher. This is stated by Patten in his posthumous memoir, Mr. Frank Merriwell.

Somehow Dinan got this wrong and other scholars have noted many other problems with this McFarland-published book. But how this flawed source finds its way into Wikipedia says something about their rules and culture. Wikipedia shuns subject matter experts and relies on published items without regard to whether they are accurate or quality sources. Briefly, if the Wikipedia editors can find it online, they will use it freely. In terms of online exposure, the Patten memoir is available on Google Books only in "snippet view" so you have to know that the passage is there and the kind of terms to search for. McFarland books, on the other hand, have generous "preview" where about half of the content can be read on Google Books. Hence, this kind of material ranks high on general Google searches.

So, having a McFarland-published book be frequently cited by others is not as extraordinary as it seems. Items that are freely available online are likely to be cited as opposed to books which may have little or no online exposure.

Another McFarland-published book is called Children's Fiction Series : A Bibliography, 1850-1950 (1997) by Philip H. Young. It is filled with spare checklists of titles from series (and publisher libraries) that are often incomplete and out of sequence. Good and complete information for the series covered was available for at least 20 years before so there's little excuse for McFarland publishing the Young book. There cannot have been any review of the content beyond spell check.

But McFarland also has issued some quality books in the field I monitor. It seems that the quality is up to the author and not the editors or the publisher. That is why I wrote that McFarland is not a rigorous academic publisher. They exist, like Edwin Mellen Press, to provide a publishing vehicle for the people working at academic institutions who need to publish regularly to maintain or advance their positions. For this reason, McFarland reduced their royalty contracts from 10% to 8% a few years ago. Their publishers are not able to be compensated fairly for their work because most are employed by academic centers and expected to make their living from that and not from writing and publishing books.

Smaller royalties exist in the series book world. Edward Stratemeyer negotiated a 4% deal with Grosset & Dunlap — 2¢ on a 50¢ book — which helped to boost sales numbers. The independent authors were paid 5% — 2.5¢ on 50¢ books. In the long run, the Stratemeyer Syndicate books sold better because the publisher was motivated to sell them more. They made more money on each copy sold. However, for his own writing, Stratemeyer insisted on a 10% royalty contract. This was often 12.5¢ on $1.25 books for Lothrop, Lee & Shepard of Boston.

In my 33+ years in this field of writing about juvenile series books, my research (published and otherwise) has been cited innumerable times. I don't keep track. My goal is to help people find accurate information about the history and authorship of these books. I have helped many with their dissertation-level research and published books. But this forum really isn't the place to present competing credentials or list of publications.

As far as your statement:

By continuing to refuse to read BRRAM, you are just spinning in spiteful loathing that is entirely unproductive.

My own full-time work and research projects and volunteer duties fill my available time. While I have a limited curiosity about how this project of yours began, I don't have dozens of hours to invest in reading each of your claims. I might request a copy later on but I have a conference presentation on Wednesday and other obligations and I won't make such a request unless I could give it a fair reading. My posts here are reactions to what is written here. I don't write with malice but I am not ready to accept the broad claims made that anything worth reading from this period comes down to half a dozen people and, by extension, that such a practice accounts for the majority of writing since. No one has ghostwritten my posts here or my writings in my fields.

James

821faktorovich
Apr 7, 2022, 9:24 pm

>820 Keeline: Just earlier today, I discovered that Verstegan appears to have made up the 1300-or-so Italian "Favio" compass invention myth, which is explained in this article, https://www.tandfonline.com/doi/abs/10.1080/00253359.1937.10657223, but nobody has previously pointed to Verstegan as this falsehood's inventor; and up until this article in 1937 (or for 3 centuries) this Favio myth continued to be repeated without correction. There are some errors in all scholarly books, even if those errors have not yet been noticed by other scholars as each point has not been fully explored for possible falsehoods. I could find a mundane error like the one you mention in some one Cambridge or Oxford University Press book to argue they are equally likely to leave mistakes as McFarland. But the enormity of the entire Renaissance being misattributed makes such minor adjustments irrelevant in comparison. I wrote a one-star review of at least one of Cambridge's titles for my PLJ journal that will serve as an example:
---
John Bowers. Deriving Syntactic Relations: Cambridge Studies in Linguistics: 151. 296pp. ISBN: 978-1-107-09675-2. Cambridge: Cambridge University Press, 2018.

*

Linguistics is a field of study that is undeniably necessary in our present times. Why? Every professor and high school teacher knows the answer. The volume of plagiarized essays is overwhelming. The essay mills are producing countless nonsensical and repetitive essays and selling them for a few dollars. Students are stealing information from Wikipedia and plastering it into their “creations”. Aside from this momentous problem, the volume of copyright lawsuits that accuses writers, musicians and others of theft of original content is climbing. There is a dire need for coherent linguistic analysis in courtrooms and classrooms. Instead of taking on these practical problems, most popular or established linguists engage in the practice of nonsense linguistics. It is easy to spot contributions to this field because they inevitably quote as their inspiration Chomsky or Lacan. The second most telling element is repetition of the same nonsensical phrases or even sentences with slight variations, as if the author is practicing self-plagiarism for the sake of proving that if readers cannot comprehend nonsense, they will assume the work is brilliant simply to avoid reading it. The repetitions in this text begin on its cover. The front inside flap replicates the book summary paragraphs from the back cover. Just imagine a book with two identical copies of the author’s biography on different parts of the cover, or what if two of these biographies were repeated one after the other on the back cover alone… The author is likely to have won a contract to publish this convoluted title because he is a professor in the Department of Linguistics at Cornell University. This is his third book in this field; one of these earlier studies was Arguments as Relations, and its title alone implies that it is also cyclical and nonsensical. What do you think is the logical relationship between arguments and relations in general? Relationship between relations? Arguments about relatives? Arguments that relate different things to each other? If this contemplation about the meaning of the title is nonsensical, imagine what an analysis of a few sentences might be like… Funny enough, the first sentence of the “Introduction” touches on this “concept” of “relations”: “There have been a number of attempts in modern era to argue that the primitives of syntactic theory should be relations (or dependencies) between words rather than constituents” (1). Then he jumps into naming a dozen random past nonsensical studies from this field, as if these notes are going to explain the lack of meaning in the opening sentence. What possible part of human language can exclude words? Constituents, regardless as to what this term means, are words themselves. If every part of language is a word, how can anything be outside of this term and yet an element used to express written language? This type of a self-contradicting opening sentence works to keep readers from attempting to find fault with the argument. If nobody can interpret the meaning within an inherent self-contradiction, then it is incredibly difficult to prove that it is nonsensical with help from logic. If I began this review thus, “The beginning interpret meaning element goes to exclude naming in syntactic theory and professor studies words”; how would you begin criticizing this string of random words put together in a nonsensical pattern? Why would you want to spend hours attempting to decode such nonsense? The only difference between pure nonsense composed of completely randomly selected words from a dictionary and linguistic research nonsense is that the latter tends to repeat the same strings of nonsense, referring to earlier studies that also said the same nonsense as proof that the research has been verified by established researchers. In addition, this particular study uses diagrams and convoluted theories that attempt to appear like serious linguistic research. Many sections begin with no introductory sentence other than “Let’s consider next the following two orders: (11) a. Dem N Num A (=Cinque’s (6c))…” Then, he goes on to explain the relations between the different categories summarized, touching on their common roots. This particular section ends in the middle of a thought without coming to any conclusions, with the promise that this “is a question I return to in §4.1.8” (134-5). If any scholar is seriously attempting to explain any concept, he or she has to begin a thought by summarizing the concept, then offer an explanation and in the end summarize what the research means to digest it for the reader. In contrast, linguistic nonsense research begins in the middle of a random thought, introduces nonsensical or extremely convoluted research, and then moves on to other topics without explaining the relationship between the random words just presented. When will publishers begin rejecting nonsense, sparing researchers like myself valuable time, and allowing for the publication of truly useful linguistic studies?
---
You appear to have an irrational jealousy of my publications and citations when you go out of your way to argue that McFarland's offer of half of their books for free on Google Books makes them too easy to cite and therefore unfair game in the citation-counting game. The accessibility of a text for free to the public makes McFarland a better publisher, as it means they are achieving the main goal of any publisher, or bringing the text to readers.

I believe McFarland has had an 8% royalty since at least 2014. If you think the higher the royalty, the better the publisher; my Anaphora Literary Press pays a 50% royalty.

Email me whenever you or anybody else is ready to read my series; the offer to send a free pdf set stands.

822prosfilaes
Apr 7, 2022, 10:03 pm

>821 faktorovich: I wrote a one-star review of at least one of Cambridge's titles for my PLJ journal that will serve as an example:

I'm not sure why you think we'd be impressed with that. You don't appreciate a subject and you trash a book on that subject, and act like everyone should be working on your pet problem. Oddly enough, you have previously trashed books on the problem of using computational linguists to figure out authors. Complaining about sparing you valuable time is a little outrageous; can you not figure out how to skip books when they don't fill your needs?

823faktorovich
Apr 8, 2022, 2:02 pm

>822 prosfilaes: Impressed? I posted the review as a rational piece of evidence to support my hypothesis that there are some errors in some of the books from all publishers, including top publishers like Cambridge. I "appreciate" linguistics very much: that's why I use linguistics in my research. I have not asked any of you to "work" on my "pet" or "pet problem". This discussion is about the problem of Renaissance attribution. It is not "my" problem. It is the world's problem. Yes, I have written negative reviews and also explained how other books on computational-linguistics are incorrect in the assertions I have proven with my own research to be faulty. I have never "skipped" a book a publisher sent for my review upon request; if I ask for a book, even if I immediately discover it is not what I expected, or is horrid, I still carry out a fully review. I even review most of the books I do not request, unless they are in pop genres that I cannot digest.

824anglemark
Apr 11, 2022, 9:29 am

>821 faktorovich:
I will be blunt. Throughout this thread you have demonstrated, over and over again, that you are not a linguist; this review shows very clearly that you do not even understand what linguistics is.

You also show a lack of consistency in your argumentation. Let's look at >74 faktorovich: where you say, about your own books, "Volumes 1-2 are 698 pages long and you have to read this full book before judging my method". That is a claim you have been making repeatedly. But in the review you quote from here, you say "If any scholar is seriously attempting to explain any concept, he or she has to begin a thought by summarizing the concept, then offer an explanation and in the end summarize what the research means to digest it for the reader". That is a comment on a cross-reference, of all things – an established (and very helpful) method to show a reader where they can get supplementary information, without unnecessary repetition. In addition, the section you comment on (section 4.1.3, which certainly does not end "in the middle of a thought without coming to any conclusions") is one part of a discussion on word order in different human languages. Is your argument that you would like 4.1.3 to start with a repetition of 4.1? At the same time, you also criticise the use of repeated sentences in the cover blurb and back flap – but in your own article in J. Inf. Ethics, the abstract consists of sentences copied verbatim or with very minimal rephrasing from the first and last paragraphs of the article. I assume you do not see that as problematic, but the result of that particular copypasting exercise is a text that is – to use your favourite word – nonsensical.

-Linnéa

825faktorovich
Apr 11, 2022, 1:32 pm

>824 anglemark: Your comments are false and indeed nonsensical. Mine are precise and logical.

1. My "J. Inf. Ethics" article does not include identical content between the abstract and the first and last paragraphs. I double checked, and I can quote these sections here and you would not find repetitions of even a single sentence. So you have made up a false accusation without even glancing at the article in question.

2. All of you guys keep making broad accusations that I "do not even understand what X is." For any X in such an accusation; you would have to read my mind to learn what I "understand", or you would have to find an instance where I use an erroneous definition for "linguistics" in general. Otherwise, you have to specify what about X you feel I have misunderstood. Given that I have written several books that discuss linguistic subjects, you would have plenty of materials to find prove of errors. But instead of searching for this non-existent evidence, it is much easier to make broad insults about my intelligence in general. While it is easy to be insulting; it is not an intelligent form of communication, as you say a lot more about yourself when you stoop to false name-calling.

3. You have not read the book I was reviewing, and thus you cannot evaluate it. For example, you simply say the opposite is true to my precise summary of an error in the text: "section 4.1.3, which certainly does not end 'in the middle of a thought without coming to any conclusions'". Yes, it does indeed end in the middle of the thought. You have to copy a couple of pages from the book in question to see how it is repetitive in the previous pages, and then comes to an abrupt stop without a conclusion. Instead, you do not know what the content was or if a conclusion is missing, and you simply leap to the conclusion that whatever I have said about it must be false simple because you are in the process of insulting me, and claiming that I do not "understand" "linguistics" in general. Thus, you do not need to actually review any evidence, you just have to claim that the opposite of what I have stated is true, and thus you have won whatever nonsensical insult-argument you are developing.

Your comments are indeed "nonsensical" with the biased goal of discrediting me personally to avoid considering or reading my research.

826Stevil2001
Apr 11, 2022, 3:55 pm

Doctor Faktorovich, have you ever made a mistake?

827faktorovich
Apr 11, 2022, 9:01 pm

Questo messaggio è stato segnalato da più utenti e non è quindi più visualizzato (mostra)

>826 Stevil2001: While most questions ask me to look within myself, your question forced me to look within your background. As your website states: "Steven Mollmann. I completed my Ph.D. in English literature, with a specialization in Victorian literature and science, at the University of Connecticut in 2016. I am currently an Assistant Teaching Professor (or Teaching Assistant Professor, I can never remember which) in the English and Writing Department at the University of Tampa". Thus, immediately upon completing your PhD you found a tenure-track Assistant Professor job. Google Scholar lists these publications: https://scholar.google.ae/citations?user=E_vNI28AAAAJ&hl=fr - 7 articles you published between 2010 and 2018 without any newer publications; there are no book-length projects. The most popular of your articles has been cited 8 times, for a total of 13 citations for all articles. The 2016-7 job application cycle was the last time when I applied for academic jobs, and I am certain I applied for the specific Assistant Professor job at the University of Tampa that you won (I have a record of interviewing with the University of South Tampa). At the time, I was working as an English Lecturer at the University of Texas Rio Grande Valley. I had previously taught for 3 years at similar full-time year-contract non-tenure-track English Lecturer jobs at other schools. Thus, I had a total of: 4 full-time years of teaching experience; I had published 2 full scholarly books with McFarland and 3 journal articles with reputable journals; I had completed a PhD in English Victorian literature from the Indiana University of Pennsylvania back in 2011 (my BA is from UMass); and all this does not even count me starting Anaphora on my own or a myriad of other accomplishments. So, the hiring manager had our two resumes in front of him and he did the math and he calculated that your zero years of teaching experience and zero book-length publications added up to a larger number than my four years of teaching experience, and two book-length publications. Thus, you were hired for a secure Assistant Professor job on your first try, while I gave up on academia and semi-retired to a tiny house in Texas to work on my research into why plagiarism is rampant among students, and why the academia is clustered with professors who have purchased paper-degrees and have hired ghostwriters to compose their dissertations. The term "mistake" is a vague concept that can refer to a single typo, or to a moral shortcoming; yes, I have made a typo before; no, I have not suffered from any moral shortcomings. How about you?

828lilithcat
Apr 11, 2022, 11:06 pm

>827 faktorovich:

The term "mistake" is a vague concept that can refer to a single typo, or to a moral shortcoming

And many things in between. Answer the question.

829paradoxosalpha
Apr 11, 2022, 11:07 pm

>827 faktorovich:

Perhaps your "record of interviewing" showed the level of graciousness that you have tended to exhibit in this thread.

830Keeline
Apr 11, 2022, 11:52 pm

>827 faktorovich:

You have made some audacious assumptions (note the flags on the post) based on some quick research that does not seem sound. For example, you have confused these institutions and assumed that he got a job you may have applied for and seem bitter about still.

University of Tampa: https://www.ut.edu/

University of South Florida (in Tampa): https://www.usf.edu/

Is this the one you call "University of South Tampa" ?

Many cities of some size have multiple academic institutions. Tampa is one of them.

I've had associations with the latter university since the library special collections hold a collection of juvenile series books initially donated by Harry K. Hudson and expanded from that core material. I helped them to add to the collection in the 1990s when they were actively building it in several areas under the leadership of Paul Eugene Camp.

Does it really seem appropriate to attempt to compare CV's and writing credits and citations? I thought this was supposed to be a discussion of this research you have been doing and writing about.

James

831anglemark
Modificato: Apr 12, 2022, 7:36 am

>825 faktorovich:
The following paragraphs are extracts from
Faktorovich, A., 2020. Publishers and Hack Writers: Signs of Collaborative Writing in the "Defoe" Canon. Journal of Information Ethics, 29(2), pp.70-83.
The text is copypasted from a html version published at Proquest, which does not reproduce the formatting and punctuation of the printed version correctly. I have checked against the printed version and fixed the issues I spotted, but if there are punctuation errors in the text they probably stem from the Proquest text conversion and not from Faktorovich.

-Linnéa

Abstract:
"This essay is part of a larger project on the de- and re-attribution of texts from the "Defoe" canon between authors that include Robert Paltock, Charles Gildon, Alexander Pope, and Edmund Curll. Even without the central computational linguistics study that separated the 47 analyzed texts into distinct linguistic signatures, evidence demonstrates how many other publishers and authors are more likely to have written Robinson Crusoe, Roxana, and Moll Flanders than Daniel Defoe. Moll and Roxana could have been written by Haywood, an ostracized female author who could not put her name on stories about loose women at the risk of having the "Intriguer" label affixed on her. They could have been composed by Chetwood, a publisher of censored books and author, who was interested in complex stories about women. The attribution might have belonged to Edlin, who was outed as a copious ghostwriter in a publicity battle with Stackhouse, one of his contracted writers. As I hope I have demonstrated in this discussion, printers like Edlin ghostwrote many of their releases, or they hired writers such as Stackhouse to write (under their own name or anonymously) content befitting the needs of the market. This essay sets out to evaluate the hidden techniques publishers utilized as they edited, selected and assisted with composing the output of texts in eighteenth century Britain." (Faktorovich 2020 p. 70)

First paragraph:
"This essay aims to uncover the hidden techniques publishers utilized as they edited, selected, and assisted with composing the output of texts in eighteenth century Britain. In 1697, Defoe observed: "I have heard a Bookseller ... say, That if he wou'd have a Book sell, he wou'd have it Burnt by the hand of the Common Hangman; the Man, no doubt, valu'd his Profit above his Reputation."¹ Booksellers, printers and publishers had a tight profit margin because the price of the materials and labor needed to produce books was high, while the percentage of the population capable of affording this luxurious purchase was small. Too many publishers who invested in books went bankrupt and were sent to debtor's prison. To survive most were desperate to utilize any creative trick in their arsenal to improve their odds of success." (Faktorovich 2020 pp. 70-71)

Final paragraph:
"Many other publishers and authors are more likely to have written Robinson Crusoe, Roxana, and Moll Flanders than Daniel Defoe. Moll and Roxana could have been written by Haywood, an ostracized female author who could not put her name on stories about loose women at the risk of having the "Intriguer" label affixed on herself. They could have been composed by Chetwood, a publisher of censored books and author who was interested in complex stories about women. The attribution might have belonged to Edlin, who was outed as a copious ghostwriter in a publicity battle with Stackhouse, one of his contracted writers. As I hope I have demonstrated in this discussion, printers like Edlin ghostwrote many of their releases, or they hired writers such as Stackhouse to write (under their own name or anonymously) content befitting the needs of the market. Even at this dawn of the printed book, publishers attributed books to popular authors to garner publicity and increase sales. We must trust the facts in methods such as computational linguistics above our own biases. I am biased toward a hope that a woman's pen is responsible for the "first modern novels," but the facts contradict this idealistic feminist hypothesis." (Faktorovich 2020 p. 82)

832anglemark
Apr 12, 2022, 5:38 am

>825 faktorovich:
About Bowers' book on syntactic relations that you reviewed: if you explain how you interpret section 4.1.3, I can attempt to explain the parts you don't understand. To fully understand Bowers' relational theory you need to understand x-bar theory, and I am not an expert on that. But at least I do understand the basics of syntactic analysis. (To begin with, chapter 4 as a whole discusses syntactic patterns, specifically word order, in different human languages. If you read the first part of the chapter more closely, you will find it much easier to comprehend the structure of the rest of the chapter, even if the models themselves are tough to understand.) I honestly think Bowers' writing style is a bit dense and overly complex, but it is far from meaningless.

Some other points:
* Investigating or preventing plagiarism is not linguistic research.
* Many linguists are neither applied linguists not generative grammarians.
* "Constituents", in syntactic analysis, is not the same thing as "words", and if you read the entire book without understanding that basic fact, it's no wonder you found it hard going. It would be similar to someone reading your books, not understanding the term "ghostwriter", and thinking that you argued that a team of dead people's spirits had written the texts. Would you like an explanation of what the first sentence of the introduction means?
* "Funny enough, the first sentence of the “Introduction” touches on this “concept” of “relations”" How is it "funny" that the first sentence in a book that proposes a syntactic model based on relations between words mentions relations between words in its opening sentence?
* In general, that you don't understand a text written for subject experts in a field that is not your own does not make the text "nonsensical".

-Linnéa

833Stevil2001
Apr 12, 2022, 7:48 am

>827 faktorovich: I'm not sure how this suddenly became about me, but the post demonstrates your typical attention to detail.

Thus, immediately upon completing your PhD you found a tenure-track Assistant Professor job.

I completed my degree in 2016, and was hired in 2017. The job is not tenure track.

The 2016-7 job application cycle was the last time when I applied for academic jobs, and I am certain I applied for the specific Assistant Professor job at the University of Tampa that you won (I have a record of interviewing with the University of South Tampa).

There is no University of South Tampa.

So, the hiring manager had our two resumes in front of him and he did the math and he calculated that your zero years of teaching experience and zero book-length publications added up to a larger number than my four years of teaching experience, and two book-length publications.

I have been teaching since 2008, albeit not full time; graduate students in my program were instructors of record for composition courses. I was also assistant director of our first-year writing program for almost three years.

Also 1) academic jobs don't really go through "hiring managers," and 2) if you did interview, it wasn't about resume at all. The CV gets you the interview, but everything after that is based on the interview.

Thus, you were hired for a secure Assistant Professor job on your first try...

2016-17 was my third season on the academic job market. It is pretty over-saturated, especially in Victorian literature. I am on year-to-year contracts, so probably not what you mean by "secure."

It is pretty impressive that you don't make mistakes, though!

834susanbooks
Apr 12, 2022, 10:20 am

"hiring manager"? Faktorovich, c'mon, admit it now, you've never been within a mile of a university.

835faktorovich
Apr 12, 2022, 12:10 pm

>829 paradoxosalpha: What do you imagine I would have had to say in these hypothetical interviews for it to have matched what I have been saying in this thread to have weighed the hiring decision against me? I am curious what you are imagining in this regard because I can't imagine hiring decisions being made on the "level of graciousness" or niceness versus rudeness instead of on the candidate's credentials, knowledge of the subject and focus on the tasks of the job.

836faktorovich
Apr 12, 2022, 12:15 pm

>830 Keeline: I don't know how you apply for jobs, but when I say 2016-7 was my last job application season, I mean it was the last season when I applied to all academic posted jobs that I was qualified for, including both University of Tampa and University of South Florida in Tampa - and I interviewed with one of these, while the other did not invite me to interview.

Comparing credentials in a specific job application by 2 candidates seemed to be the only way to answer the general question of if I have made mistakes, as this general question seemed to be questioning my overall worth as a human. All of you guys have also been focusing on my credentials or fallibility instead of on the research, so I just joined your discussion.

837faktorovich
Apr 12, 2022, 12:18 pm

>831 anglemark: My version of the printed journal issue does not have any glitches in it. The abstract does not duplicate the first/last paragraph. Let me know if you have a question about these excerpts for me.

838faktorovich
Apr 12, 2022, 12:46 pm

>832 anglemark: If I had read Bowers' book closely enough, I would have understood every word. My judgement that it is repetitive, and nonsensical would have remained intact. I just would have spent days on writing a book of my own about precisely why it is wrong in most of its sentences/ structure/ ideas/ capacity to communicate these ideas. Academic books must be dense and intellectually challenging. But too many academic books repeat the same theories in new mixtures in circles, and use big words to cloud readers' comprehension of the substance-less entity that they are browsing. The book appears to have many pages with big words, but when one uses a dictionary to dissect every big word and to create a summary of what even a page or two are precisely saying, one ends up understanding that old research is being re-cycled as if it is worthy of a new book, or what should have taken a sentence to be said is being said in pages just to take up more space. If you read my other book reviews and search for the few books I rate with a 1-star rating, you'll find other examples of this tendency.

No, I am not directly researching students' tendency to plagiarize. I am studying the plagiarisms the Renaissance Ghostwriting Workshop employed when it had to create a country-worth of book output with only six ghostwriters. Many "Shakespeare" plays plagiarize fragments or scenes from otherwise bylined plays, histories etc. And many Renaissance books are slight re-writes of earlier non-fiction books under other bylines. If a contractor hires a ghostwriter to put their byline on a book, they would pay the same amount if a book is plagiarized or not (especially if the plagiarism isn't noticed). Plagiarism that students engage in exhibits some of the same tendencies to avoid work, while reaping the benefits of profit/a grade.

I count most of my linguistic research as applied linguistics. The field of the "generative grammarians" is over-written mumbo-jumbo. I just study grammatical construction in different variants of language(s). I don't study the relationship between grammar and the brain.

I looked up "constituents", as I look up all words that I am informed carry complex meanings. "In syntactic analysis, a constituent is a word or a group of words that function as a single unit within a hierarchical structure. The constituent structure of sentences is identified using tests for constituents." I took an entire PhD-level grammar class that analyzed the structure of sentences with trees that diagrammed the constituent parts. I can explain it to you further, but since you are saying that you understand what it is, we are all set. The problem was not that I did not understand the term, but that the writer was circling around this and other concepts without getting anywhere.

The complexities of why I found the opening sentence funny would require me to quote several similar general and forced-philosophical sentences from other books I have reviewed. But it is better to focus on the sentence itself that I quoted: "There have been a number of attempts in modern era to argue that the primitives of syntactic theory should be relations (or dependencies) between words rather than constituents". The term "constituents", as I defined it above can be "a word or a group of words". Thus if you substitute this definition for this term in this sentence it would be: "syntactic theory should be relations (or dependencies) between words rather than words or groups of words", or basically "x rather than x or xy". Opening a book with a statement like this without clarifying if you mean "groups of words" as the alternative, or if you are using a different connotations for the terms, or who you are arguing against indicates that either you are very sleepy and are hypnotized by your own words, or you imagine a sleepy reader/reviewer who is not actively understanding what you are saying, and is finding fault with every inaccuracy. To me, these types of glitches are funny. One has to understand a text to find it nonsensical. I absolutely agree. I do not call nonsensical anything that I do not understand. Only if I understand that a piece lacks sense do I say that it suffers from this error.

839susanbooks
Apr 12, 2022, 12:54 pm

Faktorovich, since you've never been within a mile of academia, let me let you in on a secret: interviews for academic jobs are based on YOU. Your CV gets you the interview but your personality gets you the job. The whole time they're evaluating you, they're thinking, do I want to work with this person, will this person bring vibrance to the department, will this person be good for students? You can't take questions or critique, aren't qualified to discuss the things you claim to be expert in, and never admit to being wrong -- which is how we learn. I can't imagine why these imaginary "hiring managers" wouldn't jump at a chance to bring someone like you in/ sarcasm, since you can't seem to tell.

840faktorovich
Apr 12, 2022, 12:59 pm

>833 Stevil2001: A renewable contract is one of the most secure jobs in academia, and the "Assistant Professor" title is another sign that you are on the tenure-track, as otherwise you would be called a "Lecturer". You don't list any of your 2008-2016 teaching experience, so I could not have added up what these come out to. But if you spent 5 years in a PhD program, you would have taught at most 1 class per semester for 5 years as a TA, so you'd divide 5 by at least 4 to get the full-time teaching credits, or 1.25 years of TA teaching experience by 2016. (I should then add that I had worked mostly as a GA research assistant and briefly as a TA for a couple of classes across the 4 years of my MA/PhD studies, so you can add 1 year to my experience as well.) You were the "assistant director of our first-year writing program for almost three years", or from the start of your PhD studies after your 2 years in the MA program? Wow! They appointed you as an assistant director in your first year? You don't think that's a pretty unfair bet? Can you explain what made you more qualified than all of the Lecturers and other faculty working for the University of Connecticut's writing program?

I was joking when I called the dude making the hiring decision a "hiring manager"; you are right, he is just a professor in the department, and a regular bro without any managerial credentials.

I am happy to leave this subject alone. But you keep raising points that are intriguing.

841susanbooks
Apr 12, 2022, 1:01 pm

"A renewable contract is one of the most secure jobs in academia"

Oh, good god. We're really in bizarro world.

842faktorovich
Apr 12, 2022, 1:20 pm

Questo messaggio è stato segnalato da più utenti e non è quindi più visualizzato (mostra)

>839 susanbooks: You can search the web and find records of me inside of academia. Why are you having trouble believing in a basic fact that I was inside of a university. Maybe you should step back and accept that some things that are proven to be factually true are just true.

Yes, this is the reason you guys triggered me to veer into the subject of hiring decisions: "your personality gets you the job." I was just watching an HBO show about personality-tests; the researchers explained that hiring managers across all industries deliberately select for unintelligent, outgoing and domineering people, while rejecting intelligent, introverted and socially withdrawn. They are not selecting nice people, who are not rude. They are selecting rude people who are stupid. Bringing these types of personality measurements into academia means selecting the least qualified professors because they are not as intimidating, or overcompensate by being willing to get drunk at department get togethers, and to vote with somebody who has been there longer in exchange for the hire (even if their decisions are wrong). This type of selection prefers Elizabeth Holmes because she looks likable, and has imposed a fake outgoing personality on herself (including the non-feminine voice). The capacity to manipulate likability is great in a fraudster; it is a negative in a professor whose job would be conveying complex information in a manner that students will understand and benefit from. By saying that "they" are thinking "do I want to work with this person", you are also suggesting that male professors are questioning if they would want to sexually assault a new female hire. Because if it is a question of intellectual compatibility, professors really don't "work with" each other, as they really have to write their own lesson plans, and teach their own classes. And for men, this is not a problem, as a male professor in a hiring role, might be thinking instead: would this guy sympathize with my efforts to sleep with new female professors/ students. And when you say, "would this professor be good for the students"; you are really saying, "would this professor give easy As, while ignoring plagiarism, paper-mills, mass-cheating, quid-pro-quo exchanges between some students and professors". Because the truly good professor for students is the one that few of them like (perhaps because many of them don't do well), but all of them leave with a concrete understanding of the subject that has been taught. It is very common for insiders who have these types of disruptive objections (designed to keep the status-quo of academia at its worst) to coat them in claims the professor is "too rude" or "not an expert in what he claims to be an expert in". The books/ articles I have published and the content of my lectures proves I am an expert in the fields I research and teach in. If you are sticking with the conclusion that personality-tests are the correct measure for professorship; then, don't digress into the intellect being relevant to this personality-competition.

843faktorovich
Apr 12, 2022, 1:23 pm

>841 susanbooks: If you want to know more about the percentages of tenure-track/ renewable faculty, just read this report: https://nap.nationalacademies.org/resource/26405/6_The_Impacts_of_2020_on_Advanc...

844Stevil2001
Apr 12, 2022, 1:27 pm

Not sure how this thread suddenly ended up being about my credentials...

A renewable contract is one of the most secure jobs in academia...

You should tell this to the professor down the hall from me whose contract was not renewed this year because they replaced her with a tenure-track line.

...and the "Assistant Professor" title is another sign that you are on the tenure-track, as otherwise you would be called a "Lecturer".

I think I might know my own status better than you, but an "Assistant Teaching Professor" is not, at the University of Tampa, a tenure-track position. You are welcome to verify this by reading the Faculty Handbook here.

You don't list any of your 2008-2016 teaching experience, so I could not have added up what these come out to.

Maybe you should not make claims that the evidence will not support. I started to type out point-by-point rebuttals of the rest of your paragraph, but then I realized that statement sums it up. You are making a lot of assumptions for which there is no evidence, and I am not sure why I should have to defend a hiring decision made by my program director back in 2011!

845susanbooks
Modificato: Apr 12, 2022, 1:36 pm

Faktorovich, your view of academia is a cartoonish fever dream. All of those things happen, but to suggest that entire departments or even hiring committees generally, across the board, are designed to further those activities is absurd and paranoid, especially in English departments where everyone trips over themselves trying to be as inoffensive as possible. You're just being silly. Has this all been heavy-handed, curmudgeonly satire?

846spiphany
Apr 12, 2022, 3:37 pm

>836 faktorovich: All of you guys have also been focusing on my credentials or fallibility instead of on the research, so I just joined your discussion.

No, actually the majority of the discussion has focused precisely on your research -- or rather, on whether or not your comments demonstrate mastery of the subjects you are writing about.

In terms of credentials, it doesn't matter to me all that much whether you or anyone else taking part in this discussion have a PhD after their names or not, or what your career path has been. Qualifications on paper are only one way of establishing credibility. (Which, by the way, is also one of the main reasons why blind reviews are a thing -- so that research isn't judged on the author's credentials or connections, but rather on the merits of the paper itself.)

What makes me critical of your research is the fact that you continually make statements that invite the conclusion that you don't know as much as you claim about statistics, or typesetting, or linguistics, or Biblical history, or any number of other areas.

Note that this is NOT because I myself have supremely advanced knowledge in any of these fields, but because many of the things you write don't make sense even at a fairly basic level. And because, when asked about these basic things by people with far more knowledge of said topics than me, you are not able to provide satisfactory explanations.

847VeraJunior
Apr 12, 2022, 6:50 pm

Instead of buying birthday presents for my mother I spend the entire evening reading this thread, because it is fascinating and horrifying in a way similar to watching a car crash in slow-motion. I commend al the people who continue trying to discuss the topic in good faith and I will now 'ignore' this topic, because despite having learned all kinds of interesting things and laughing out loud while reading, I fear if I continue I will end up crying.

848clamairy
Apr 12, 2022, 7:48 pm

>847 VeraJunior: I envy your ability to walk away.

849faktorovich
Apr 12, 2022, 9:04 pm

>844 Stevil2001: Yes, you are a better judge for if you are on the tenure track or not. And it is absolutely up to you if you choose to explain your experience or not. My curiosity was merely peaked in this direction by your previous response. You absolutely do not "have to defend" anybody's hiring decisions. I was attempting to use your case as a case study into how hiring decisions and general evaluations of the value of an academic are made.

850faktorovich
Apr 12, 2022, 9:20 pm

>845 susanbooks: Since "satire" is "the use of humor, irony, exaggeration, or ridicule to expose and criticize people's stupidity or vices"; then, if you think what I said was funny, or exaggerated; then from your perspective it is indeed satirical. Either way I was certainly being "curmudgeonly" or negative. If I exaggerated these problems, it was merely because the idea that "personality" is the primary reason for hiring decisions in academia is deeply offensive to me, and suggests all of these underlying problems. It is only an "unrealistic" distrust of others, if you take my exaggerations as non-exaggerations. Obviously, there are some good academics and good hiring academics who do not commit regular sexual misconduct in the workplace and need to cover it up with agreeing coworkers. And there are obviously some professors who fail all students who plagiarize, or who they suspect of purchasing papers from paper-mills. However, I have read many news stories over the years about how such honest professors have been fired while trying to expose such misdeeds. Here are some articles I found when searching for such recent cases:

https://www.kgw.com/article/news/local/fired-professor-files-whistleblower-lawsu...
https://www.maciverinstitute.com/2018/06/regents-committee-recommends-firing-whi...

Very often "trying to be as inoffensive as possible" can additionally be weaponized against an undesirable professor who is filing complaints or is suspected or shortly following complaint(s) against others' misdeeds. For example, a professor who files a harassment complaint can suddenly be found to have offended somebody by saying "gay", or by discussing "critical race theory". Though these are recent developments, and other types of "offensive" content was previously used to find professors to be controversial enough to fire over (without regard for freedom of speech) and especially to terminate a professor on an otherwise unbreakable/ timed contract. I can get into the case law, if these seem too paranoid and insufficiently in legalese.

851faktorovich
Apr 12, 2022, 9:24 pm

>846 spiphany: What I am hearing you saying is that you have not understood some of the complex points I have raised in this discussion, and that I have not fully explained these points when I was asked about them. This can be easily remedied if you find whatever points you have struggled to understand, and repeat the questions that are troubling you, and I will address them in more detail and in a non-specialized language (as much as possible).

852faktorovich
Apr 12, 2022, 9:25 pm

>847 VeraJunior: Dear Vera Jr.: I hope you will find peace in the silence.

853abbottthomas
Modificato: Apr 13, 2022, 8:34 am

>849 faktorovich: Your curiosity "was ..... peaked". I think that while your curiosity may 'have peaked' - rather like VeraJunior's - you would be better using the transitive verb 'piqued' in this sense.

This post is to confirm that I am paying attention.

854lorax
Apr 13, 2022, 9:09 am

faktorovich (#836):

The question of suitability for a particular job is completely unrelated to the question of "Have you ever made mistakes". That's a question of basic self-awareness and introspection, not of your worth as a human nor of your skills in a particular area. We've all had ample evidence in this thread for us to make a decision about your qualifications to speak on particular subjects, and on your skills and knowledge in fields touched on here, and your worth as a human is quite beside the point - I've seen nothing here to suggest that you're not a kind and decent person.

However, it is an adage in academia and elsewhere that the easiest person to fool is yourself, and that it's very important to go over your research with a critical eye to look for mistakes of one sort or another. An inability to recognize your own capacity for error is far, far more serious than a particular error in many contexts.

(Have I ever made mistakes? Of course I have! I don't know that I've made any yet today, unless perhaps it's responding to this thread, but it's only 9:00 AM. Give me a couple hours.)

Since you seem to take this question as an attack, may I recommend a book? My son found it useful when he was in first grade.

855faktorovich
Apr 13, 2022, 12:30 pm

>853 abbottthomas: While the cliché expression might include "piqued" (stimulated), I meant to say "peaked", as in, reached a peak or top. The stimulation of curiosity sounds like an erotic endeavor, whereas I was referring to intellectual curiosity on its curve of interest.

856faktorovich
Apr 13, 2022, 12:41 pm

>854 lorax: The book you recommend is indeed a curious moral lesson: "Beatrice is so well-known for never making a mistake that she is greeted each morning by fans and reporters, but a near-error on the day of the school talent show could change everything." This is a perfect summary for how intelligent people and in particular intelligent girls/women are treated in modern society. The incompetent people are so jealous of the attention a scientist, researcher, writer, or other types of intellectuals receive that they seize on any minor "near-error" to prove the most brilliant people among us are actually thereby proven to be idiots. Thus, a single typo can be used to give a potentially scientifically-groundbreaking essay a "D", while a minor repetition can be used to reject an essay from a scholarly journal. Meanwhile, those who make an enormous quantity of mistakes (plagiarism, constant repetition, nonsensical anti-logic) are accepted because they make incompetent people feel superior. I would recommend that you learn from this lesson and step back and ask if your goal is the promotion of worldwide incompetence. If it is, you are all set, carry on. If it is not; then, all new theories have to be fully considered and focused on, instead of being dismissed because doing so is in your self-interest. For example, what brilliant things was Beatrice doing before this "near-error"? What about a book dedicated to exploring these roots of intelligence that leads to the happy ending of Beatrice becoming a brilliant scientist? Instead, we have a horror story about a typo that stopped a woman from ever trying to be perfect (i.e. brilliant) again.

857lorax
Modificato: Apr 13, 2022, 1:43 pm

It's a picture book for young kids. You ask us to read many volumes of your own work and can't be bothered to read 20 pages before totally dismissing it out of hand, and in the process completely missing the point?

Hint: One of our family mottos is "Mistakes are chances to learn and do better". If you, like the character in the story, are petrified of making mistakes, you will take no risks and never learn.

858abbottthomas
Apr 13, 2022, 2:40 pm

>855 faktorovich:. What you meant was made clear by your last sentence in >840 faktorovich:. “… was……peaked” is ungrammatical nonsense.

8592wonderY
Modificato: Apr 13, 2022, 3:01 pm

>858 abbottthomas: Do you refer to her usage in >849 faktorovich: ?

860faktorovich
Apr 13, 2022, 8:47 pm

>857 lorax: There is no free version of this children's book online. I am offering the full set of BRRAM volumes to all of you for free in a pdf to assist your ability to review it fully without obstacles. Are you the author of this children's book, and are you asking me to review it? Then, you have to send a free review copy. I do not even review fiction books unless there are special circumstances, so I would not review a children's book typically. Since you guys were advertising the title and stating it was essential to this discussion, I went out of my way to comment extensively about it. But you are stating that I have misunderstood the moral? You have to quote the entirety of this book's text for us to be able to evaluate if I am indeed mistaken or if you are right. And indeed you have not stated what the book's "point" is, but instead have introduced your family's personal motto. Have you read this book? Why would any adult human without children who does not review children's books have read any children's book, or why would he or she be compelled to purchase a children's book just to argue a point? But to focus on your motto. Yes, mistakes are chances to learn from them; but it is better to learn first, and then through understanding avoid making mistakes. I am not "petrified of making mistakes". That's a ridiculous conclusion. I write and publish 3 issues of my own 2 journals annually (6 books), and write a few other books of my own in the same span; if I was even slightly frightened of "mistakes", I would have been paralyzed from writing even a short story, and certainly incapable of writing up to or over a dozen books annually. Instead, what you and others have been saying is that I should be too afraid to make mistakes (based on the typos you have been pointing out) to stop writing without hesitation. Specifically, your previous motto for me was: "An inability to recognize your own capacity for error is far, far more serious than a particular error in many contexts." So you were saying that I have to be afraid of my mistakes, or my failure to recognize this "capacity for error" would be a "serious" personal shortcoming. And now you are saying that your motto has changed, and now it is important to not focus on mistakes or to not be afraid of them. You gotta pick a better motto, or read your children's books more closely to explain your moral philosophy consistently.

861faktorovich
Apr 13, 2022, 9:04 pm

>858 abbottthomas: You are saying that the second of these sentences is correct: 1. "My curiosity was merely peaked in this direction." 2. "My curiosity has/have? merely peaked in this direction." This is because the "transitive verb" has to be used (and thus you had suggested changing "peaked" to "piqued"). A transitive verb is simply all verbs that connects the subject to an object; so that in "The man walked to the dog", "walked" is a transitive verb. What you are saying is nonsensical because: 1. There is nothing more "transitive" or more appropriate for a "transitive verb" about any of these variants (was/has/have or peaked/piqued). Instead, you seem to be trying to say something about the present/past tense, but failing to use the proper verbs (is peaking/ was peaking). It is simply awkward to say, "My curiosity has/have peaked" or "piqued". Both "piqued" and "peaked" can be used as past tense verbs (even if "peaked" can also be used as an adjective); when these words are used as past tense verbs, they can be substituted in a sentence without changing anything else around them; one just means "reached highest point" and the other "stimulated".

862anglemark
Modificato: Apr 14, 2022, 7:37 am

>861 faktorovich: What are you talking about? This has nothing to do with transitivity, especially since "peak" can be intransitive or transitive. (And in "the man walked to the dog", "walked" is intransitive. You probably meant "the man walked the dog". It is an easy error to make.)

It also has nothing to do with using the "past tense". "has peaked" is in the present perfect, not the past tense, while "was piqued" is indeed in the past tense, passive voice. The two verbs are not interchangeable in the expression "my curiosity was piqued" (adding "in this direction" is not idiomatic – you would say "by this"). "My curiosity was / is peaked" is completely unidiomatic, I get three hits for "curiosity is peaked" in the entire COCA corpus, none for "curiosity was peaked". The corresponding figures for "curiosity is / was piqued" are 15 and 43 respectively. (COCA = Corpus of Contemporary American English, https://www.english-corpora.org/coca/ ).

-Linnéa

863Keeline
Modificato: Apr 14, 2022, 11:05 am

There is no free version of this children's book online.

It is risky to make absolute declarations like this.

Archive.org has a copy of the book in question that can be read on screen with a free membership.

https://archive.org/details/girlwhonevermade0000pett

The site is a good resource for finding old public domain material from several sources as well as copyright material to sample and see if it is worth locating.

It is not the only but is something every researcher of books in English should be aware of for the times it can help.

James

ETA: My iPad added some typographer's quotes instead of "straight quotes" so this created a malformed HTML link. It should be better. I will test and correct further if needed.

864lilithcat
Apr 14, 2022, 10:19 am

>863 Keeline:

You have extraneous quotation marks at the end of the URL which result in a "page not found" error.

https://archive.org/details/girlwhonevermade0000pett

865faktorovich
Apr 14, 2022, 1:30 pm

>862 anglemark: A transitive verbs are "able to take a direct object", whereas intransitive verbs "do not require a direct object" or "do not have objects". "Have or has is used with a past participle to form the present perfect tense. This tense designates action which began in the past but continues into the present." To say that interest "has peaked" is illogical because it means it began peaking in the past and is continuing peaking in the present; a peak is the highest point on a graph; so, it could have reached this point in the past, or in the present, but if it is continuing at this point from the past into the present, it has flatlined instead of peaking; thus to say "has" in that sentence is incorrect. Instead of analyzing rational grammatic rules you are basing your argument on nativism as "idiomatic" means, "using, containing, or denoting expressions that are natural to a native speaker". Just because few people have said "curiosity was peaked" before does not mean it fails to express the precise meaning I intended to convey, as I stated that my curiosity was at its peak point at a specific past moment just before I wrote down that this peak had occurred. When expressions are too frequently repeated unchanged, users tend to ignore what they precisely intend to say in favor of repeating a comfortable cliché. Why didn't you check the present perfect tense idea you were proposing of "has peaked"?

866faktorovich
Apr 14, 2022, 1:47 pm

>863 Keeline: Dear James: That's helpful. I was able to view this book when I created an account. After looking inside, I absolutely confirm my conclusion that this is a very negative book designed to encourage girls that incompetence is better than struggling to do the right things and to be intelligent. After she is caught in the semi-error, the ridicule she receives triggers a downward spiral where she puts peanut butter and jelly on the outside of sandwiches, and falls a lot during skating. In other words, a single mistake made her give up on being a good girl all together, and for her this means failing to keep up with hygiene/ proper self-maintenance, and engaging in dangerous activities. While this seems adorable in a children's book, try working as a K-12 substitute teacher, and you'll notice how the American culture is promoting such misbehavior and prevalence to err as much as possible. Kids throw things at teachers; they repeatedly disrupt class to yell at others; they might even engage in self-harming behaviors. And after teachers, psychiatrists, and media are promoting such childish misbehaviors as healthy; these kids are then told that they are also signs of mental illness, and thus these kids are put on drugs to neutralize their progressive mistake-making, but no drugs can fix problems of miss-conditioning. And then finally at 62 one of these misbehaving kids might grow into a man with dozens of convictions who shoots up a subway and injures 29 people. It all starts when children are taught it is better to misbehave because they can never achieve perfection in their behavior. Such nonsensical conditioning should just be avoided in favor of simply teaching the rules of language and behavior and always rewarding their best possible execution, without turning an insignificant mistake into a public-shaming, neither should the quest for perfection be derided as equally shameful (especially in girls).

867anglemark
Apr 15, 2022, 8:15 am

>865 faktorovich: OK, so what you are saying is simply that you made a spelling error, "was peaked" instead of "has peaked". Conciseness is a virtue (which I do not always possess myself). Your partially correct claims about tenses and irrelevant asides about transitivity obscured your point.

8682wonderY
Apr 15, 2022, 11:18 am

>867 anglemark: I think you still do not understand. @factorovich has never made an error. Ever. And I’ve got money riding on her shortly responding to you to clarify that in another essay-length post.

869faktorovich
Apr 15, 2022, 1:35 pm

>867 anglemark: No, I did not make an error of any kind in the "was peaked" statement. You guys are making grammatical errors as you are attempting to find mistakes my perfectly correct statement.

870faktorovich
Apr 15, 2022, 1:37 pm

>868 2wonderY: You are probably correct to put money on that bet. Except there was not enough meat in the errors anglemark made for me to write an entire essay about them.

871paradoxosalpha
Apr 15, 2022, 2:45 pm

Literate anglophones easily recognize "was peaked" as an eggcorn. The greater the protestation, the more absurd the error.

872Keeline
Modificato: Apr 15, 2022, 3:27 pm

There is a phrase for that called "Muphry's Law". As the Wikipedia page describes, it is the propensity to have errors in a message complaining of other errors (usually published).

Phrases such as "piqued my interest" are idioms or colloquialisms that are harder to grasp until you've seen them explained in writing. The descriptive dictionary definitions (as compared with prescriptive dictionaries which are almost completely gone it seems) let you down when two words that sound similar have close but different meanings. This is compounded when some aspects of language are absorbed verbally rather than in writing. Those for whom English is not a first language can be challenged by these nuances and it requires extra effort to embrace them. Phrases like "on the other hand" don't make sense from a dictionary definition but it is what is used and not "in the other hand."

If one writes "my interest was peaked" it is confusing because it fist says that it rose to a certain apex but is now on the decline. While this could be true in a certain usage, it does not follow the conventional form of "my interest was piqued." There are many pages which describe the nuances and these are just three of them.

Vocabulary.com

Merriam-Webster.com

WriteAtHome.com

In an effort to lighten the mood but still be relevant, the Weird Al Yankovic song called Word Crimes calls attention to many phrases that are confused and tries to solidify the correct definitions through animated illustrations. It is an "ear worm" so be warned of that. You may be hearing parts of it for the next day or two.

James

873lilithcat
Apr 15, 2022, 6:27 pm

>872 Keeline:

If one writes "my interest was peaked" it is confusing because it fist says that it rose to a certain apex but is now on the decline.

And it would be better if one wrote "my interest had peaked".

874faktorovich
Apr 15, 2022, 8:44 pm

>871 paradoxosalpha: An "anglophone" is an "English-speaking person", and this category includes World Englishes, or a range of variants that are spoken as the first language in countries as far apart as Australia, Liberia, Ireland, Jamaica, Guyana, India, Pakistan, and South Africa. Clichés or eggcorns vary widely even within a single country, or between Southern and Northern regions in the US. The etymology of "pique" is that it is derived from the French "piquer" or "to prick, sting" (the noun just means "prick"). Do you feel so strongly about the "prick" as to continue debating this point?

875faktorovich
Modificato: Apr 15, 2022, 9:22 pm

>872 Keeline: You have arrived at the big argument for all grammarians on if it is better to prescribe or describe grammatical rules. I side with the side of the prescribers, as I think it is worthwhile to risk committing a new error to consistently correct all noticed errors during an edit. One of the reasons I decided to translate multiple volumes in the BRRAM series is because grammar and spelling rules had not been strictly prescribed yet during the Renaissance (though the Workshop started the process of standardization), thus, for example, there are many variant spellings for the same words. It is difficult even for a specialist in Early Modern English to read a text with these inconsistencies without looking multiple words up in a Middle English or other dictionaries. The English language would benefit from corrections of awkward and illogical spellings, which are currently set as required dictionary-spellings. It is the job of grammarians to at least enforce the existing rules (if not to write textbooks on how to change them for the better) to avoid slipping into a post-Babel incomprehension of others.

You are irrationally attached to "piqued my interest" because your grammar teachers have insisted this is a required spelling for this phrase. The best grammarians not only understand the commonly used rules (including minor preference for word-choice or spelling), but also understand how rules can be altered when they are in conflict with other rules. It is irrational to state that all worldwide English users must use "piqued", just as it would be absurd to force everybody to instead use, "pricked my interest". The latter might be offensive or censorable in some highly religious cultures. In these cultures it might be just as necessary to instead use "peaked", as it would be to use the euphemism "Do your business" instead of "to defecate". It is far healthier for a language for new phrases to be introduced because of such cultural preferences, than if prescription enforced only a small set of phrases to be "correct", while all variants and deviations were "incorrect".

I explained this point and you are paraphrasing, but to use your wording, "rose to a certain apex but is now on the decline", applies both to "was peaked" and "my interest was piqued" because the latter is as much as to say "my interest was stimulated", and the latter also expresses a past tense point of top-stimulation that has also now declined.

Weird Al Yankovic's "Word Crimes" is indeed a funny song. But you are thinking too simplistically if you assume that the distinction between "peak", "peek" and "pique" is one that can be solved by checking a quick reference guide. "Grammer" is an obvious misspelling of "grammar" (according to Yankovic), but "peak" is not a misspelling. The intersection in meaning between "peak" and "pique" indicates that a grammarian or English teacher have to be aware of this intersection to avoid making an error him or herself when correcting a student between these two terms. Only very bad teachers fail to consider if a student might be grammatically/spelling-wise correct, and only a teacher's conditioned biases are pushing him or her towards a nativist correction. For example, I could have been making a joke regarding the use of the "piqued" spelling during the Renaissance to mean "pike" (prick), as used in "John Smythe's" "Instructions, Observations, and Orders Mylitarie" (i.e., Military) (1595): "How the captains and officers are to reach their piquers to shoulder their piques." https://www.google.com/books/edition/Instructions_Observations_and_Orders_Myl/t8... I was just reading this book the other day, as part of my research for "Restitution".

876anglemark
Apr 16, 2022, 6:44 am

>873 lilithcat: Here's something interesting: up until the 20th century (-ish), BE was used as an auxiliary to form the present perfect/past perfect tense in parallel with HAVE with intransitive verbs. ("He is come" vs "He has come"; for instance, in Pride and Prejudice, Mrs Bennet says "My dear Jane, make haste and hurry down. He is come -- Mr. Bingley is come -- he is, indeed."). Around the beginning of the 17th century, BE was used more than 90% of the time, by the year 1700 it was ~80%, by 1800 50-60%, and by 1900 not more than ~10%. (I wrote an undergrad thesis on this in the mumbleties, and these percentages are from the research available to me then.) There were various factors influencing which auxiliary writers used, and verbs that kept the use of BE longer were verbs like come, grow, and become, denoting states or results of action rather than the action itself. So "was peaked" as an active verb phrase would probably not have looked weird a few hundred years ago. It's similar to the variation that exists in German, where some intransitive verbs use HABEN ("have") and others SEIN ("be").

Well, I think it's interesting :-)

877lilithcat
Apr 16, 2022, 8:30 am

>876 anglemark:

it is interesting.

878faktorovich
Apr 16, 2022, 1:42 pm

>876 anglemark: I appreciate your research, but "was peaked" does not look "weird" to a professor of grammar today. It only looks weird to a casual English speaker who is too familiar with or has overused the cliché. I searched for variants in books, and found:

1868: "She watched Wentworth's face with a good deal of interest. He piqued her." Errors: Repetition of synonyms interest/piqued; attracted = piqued. https://www.google.com/books/edition/Crowned/KN0BAAAAQAAJ?hl=en&gbpv=0

1989: "Now, knowing the interest that you folks have, I am sure you can identify some areas in here where your interest will be immediately piqued." Errors: Double repetition of interest + piqued; circular logic that restates past and future interest. https://www.google.com/books/edition/Interest_Group_Meeting_Proceedings/aI4RAQAA...

1994: "HAS INTEREST IN QUALITY PEAKED IN THE U.S.?" Correct application of past perfect tense to describe a top of a curve (peaked) that appears to be continuing into the present. https://www.google.com/books/edition/Baldrige_Award_Winning_Quality/nf7sAAAAMAAJ...

2021: "it is observed that the value of green city interest peaked between 2016 and 2017, with the value of 100%." Correct usage of past tense to describe a tip of interest at a point in the past. https://www.google.com/books/edition/Innovation_in_Urban_and_Regional_Plannin/P5...

2021: "to reach the primary statement early enough that the reader's interest is maintained, but there still is little guidance about how to get that interest peaked." Here is a writing teacher using "peaked" as part of a textbook about the craft of writing. https://www.google.com/books/edition/Teaching_Creative_Writing_to_Second_Lang/vI...

2021: "I am also perplexed that in researching the effectiveness of these programs, because of my curiosity, statistical data has not been updated in some areas since 2014. My curiosity has peaked and I plan to continue with additional..." Errors: Repetition in conversational usage. https://www.google.com/search?rlz=1C1CHBF_enUS720US720&tbs=bkv:a&tbm=bks...

As you can see from these examples, while there are online grammatical cheat-sheets that suggest there is a "correct" usage of this phrase that includes "piqued" and a specific tense, published books from different publishers and in different genres use many variants that include "peaked" and a range of tenses. The errors in these usages tend to be with repetition or failure to understand that "piqued" means "interested" or "stimulated", and so both don't need to be repeated in a sentence. The application of "peaked" is less prone to errors because writers understand the exact meaning of this term.

879MrAndrew
Apr 17, 2022, 3:33 am

This all started with "my curiosity was merely peaked", back in >849 faktorovich:. Interest is different than curiosity, and "my curiosity was piqued" is a well-known phrase, or cliché if you prefer. Whereas "my curiosity was merely peaked" is, at best, odd, and therefore lacking in clarity of meaning.

There. That should easily get us to 900 posts, or even 1,000.

880MrAndrew
Apr 17, 2022, 3:42 am

>876 anglemark: that do be interesting.

881abbottthomas
Apr 17, 2022, 5:39 am

>879 MrAndrew: I am waiting, fascinated, for post #1000. I never imagined this thread could get so far. When I log on to the thread on my iPad a small blank square appears transiently at the bottom right of my screen. Is this an omen?

882faktorovich
Apr 17, 2022, 12:07 pm

>879 MrAndrew: One way to continue this conversation is to consider the alternative definition for "piqued" as "irritated or resentful", so the cliché can also mean, "my curiosity was irritated", as well as "my curiosity was stimulated"; the first indicates a rash on your curiosity, while the latter is a repetition because it would be more direct to just state one is "curious", as opposed to saying one's sense of curiosity was erotically or emotionally "stimulated". In contrast adding that one's curiosity was "peaked" means it reached a high-point, so it is not a repetition to add "peaked" to already stating one was/is "curious". Can you explain why you believe "...peaked" is "lacking in clarity of meaning", when the opposite is clearly the case?

883Matke
Apr 17, 2022, 12:17 pm

This is…fascinating.

I don’t think I’ve ever come across someone who has never in their life made a mistake in judgement, or jumped to wrong conclusion, or never had a wrong idea that had to be reworked.

What a rewarding and educational, if exhausting, experience this thread is.

8842wonderY
Apr 17, 2022, 2:14 pm

>883 Matke: My daughter had a middle-school teacher who was perfectly perfect. The kids would bring home stories and the parents were able to corroborate them. I gained some understanding reading Scott Peck’s People of the Lie.

885faktorovich
Apr 17, 2022, 8:46 pm

>883 Matke: As I have been explaining, most of the BRRAM series is a translation of texts from the British Renaissance that haven't been translated before. Thus, the main task I am doing is "reworking" or editing or making changes to the original works to perfect them and make them both easy to understand and to annotate and introduce them in a way that invites deeper understanding of the covered subjects. I have been working and reworking and adding evidence to my re-attributions of the British Renaissance for over 2 years now. You might have noticed my process in this discussion regarding "piqued" and "peaked"; first, I make a guess or a hypothesis; then, I test it; then, I describe the results; then, I search for additional evidence to support the conclusion I have come to; and then, as I work on new hypotheses, I tend to find still more evidence to support earlier conclusions that I had not even realized I was looking for. Meanwhile, you guys are digressing into philosophical questions about what is the nature of perfection and evil. I find it far more interesting to study the etymology of any single word in its brilliant complexity to the depth that fits a given purpose, than to contemplate if a perfect definition of any word is theoretically possible.

886prosfilaes
Apr 17, 2022, 10:29 pm

>885 faktorovich: You might have noticed my process in this discussion regarding "piqued" and "peaked";

I noticed that you made a mistake in which word to use and then doubled down on it.

I make a guess or a hypothesis; then, I test it; then, I describe the results; then, I search for additional evidence to support the conclusion I have come to; and then, as I work on new hypotheses, I tend to find still more evidence to support earlier conclusions that I had not even realized I was looking for.

Which is notably not the scientific method. You take the conclusion you get, and instead of continuing to test it, as a scientist would, you take it as fact and search for additional evidence. When you work on new hypotheses, you don't keep your eyes open for things that could disprove previous conclusions; you find evidence to support your assumptions. Your method is biased towards making arguments for your conclusions instead of trying to find evidence for or against your theories.

887faktorovich
Apr 17, 2022, 11:49 pm

>886 prosfilaes: If you had read my argument, you would have noticed that it is in fact more grammatically correct to use "peaked" than "piqued", and that the "peaked" variant is used about as frequently in published books as "piqued." It is actually you guys who are refusing to admit when I am right, and you are wrong. An argument is won when logic is on one's side, not when one keeps repeating the original objection.

You just did not read any of the statement that you are quoting that starts with "I make..." I explain that I keep searching for additional evidence and re-testing the findings. I have not found any evidence to contradict my final conclusion regarding the number and identity of the Renaissance Workshop ghostwriters. I have found an overwhelming volume of evidence to confirm it, and keep finding still more evidence every single day, as I continue editing and researching. You are the one who is not even keeping your eyes open to what I am actually saying. In contrast, I am very alert, and would absolutely notice if any contradictory evidence had come up. It would be impossible for any researcher to write 17 books (as I have done so far) full of entirely new evidence on every page without being extremely alert and closely analyzing each piece of data, documentation etc. You guys can be spending this time actually examining the evidence I present in BRRAM (and asking questions about specific points where you might have imagined a different interpretation), if you had asked for review copies. But instead, you keep repeating that you are refusing to review my work because you are biased against my conclusions.

888MrAndrew
Apr 18, 2022, 1:22 am

>882 faktorovich: "the cliché can also mean, "my curiosity was irritated"... Pretty sure that flies in the face of the definition of a cliché.

My own curiosity has been piqued, if not "erotically stimulated" (blergh), by this thread. And it has not "merely peaked", or somewhat plateaued, or slightly declined, or or precipitously dropped, or any other bizarre, conveniently concocted metaphor.

I read the earlier suggestions that this was a world-class trolling exercise with scepticism. Now i'm wondering if its possible that someone can be a world-class troll without realising it. An accidental troll, if you will. Someone with an unshakeable self-belief and yet possessing zero self-awareness. Fascinating.

889Felagund
Modificato: Apr 18, 2022, 2:58 am

> Someone with an unshakeable self-belief and yet possessing zero self-awareness. Fascinating.
Maybe a rogue Artificial Intelligence? The interview linked by the first message of this thread would disprove this theory, but I will keep digging for facts that support it ;-)

890scaifea
Apr 18, 2022, 7:06 am

>888 MrAndrew: Now i'm wondering if its possible that someone can be a world-class troll without realising it. An accidental troll, if you will. Someone with an unshakeable self-belief and yet possessing zero self-awareness.

If you need further proof, I can introduce you to my MIL...

891lorax
Apr 18, 2022, 8:24 am

faktorovich (#885):

then, I search for additional evidence to support the conclusion I have come to; and then, as I work on new hypotheses, I tend to find still more evidence to support earlier conclusions that I had not even realized I was looking for

Earlier, I remarked about the truism that the easiest person to fool being yourself, and how dangerous this is in a scientific context; indeed, something that is generally taught the first time students encounter the scientific method (often in high school, though sometimes not until an introductory course in college) is that one must always try to disprove their hypothesis, not to prove it. I suppose I should no longer be at all surprised that your understanding of the scientific process, like that of the relationships between languages, or statistics, or the importance of error, or for that matter a charming children's book about growth mindset, is utterly wrong.

892drneutron
Apr 18, 2022, 8:37 am

>889 Felagund: I think this is a Turing test and we're failing...

893MrAndrew
Apr 18, 2022, 9:08 am

>892 drneutron: yeah, I've been thinking the same thing. Some sort of challenge set up by the AI that Stephen Hawking downloaded his intelligence into, before his "passing". Beating Chess and Go grandmasters must get boring after a while. If you can convince LT folks that they are talking to a real person, well...

894paradoxosalpha
Modificato: Apr 18, 2022, 9:55 am

>892 drneutron:
A Turing test is to test the putative AI, not the human interlocutors. Humans can "fail" a Turing test if they are indistinguishable from bots, I guess.

895faktorovich
Apr 18, 2022, 12:58 pm

>888 MrAndrew: A cliché can have more than one common meaning if it is a double entendre, such as "Lady, shall I lie in your lap?" in Percy's "Shakespeare"-bylined Hamlet.

It is certainly up to you to relate what level of curiosity you are experiencing.

There is a difference between deluded self-belief, and self-belief that has been arrived at by researching and writing 17 books that prove one's self to be correct. Only somebody who refuses to consider all these volumes of evidence can be scientifically described as "deluded" or "believing something that is not true" because they believe in the untruthfulness of the evidence without having read it.

896anglemark
Apr 18, 2022, 12:59 pm

>891 lorax: One thing that was remarked upon early on is the fact that Dr Faktorovich has not had any training in the scientific method – that's not how literary scholarship works (though I don't know if any part of her methodology would be accepted by literary scholars either). So her lack of understanding of how a hypothesis is posed and addressed isn't really any stranger than her lack of understanding of statistics or stylistics or handwriting analysis. That she keeps making new bonkers claims in pretty much every post, while going lalala I can't hear you whenever anyone points out an obvious error, is just an added layer of frustration. (When did her book series grow from 14 to 17 volumes, btw?)

-Linnéa

897faktorovich
Apr 18, 2022, 1:01 pm

>889 Felagund: Your response can be applied to a denial of pretty much any scientific theory from climate change, to the earth being round. A clear sign of "Artificial Intelligence" or bots activity is that whatever is being said can be applied to denying or arguing against or trolling pretty much anything. The less specific the criticism, the less the writer has to engage in original research, and the more likely a bot could have regurgitated a standard denial reply.

898faktorovich
Apr 18, 2022, 1:02 pm

>890 scaifea: Does your mother-in-law have the evidence you are citing?

899faktorovich
Apr 18, 2022, 1:07 pm

>891 lorax: By re-attributing the entire British Renaissance I have disproven a 400+ year hypothesis with overwhelming proof of my original conclusion. All previous researchers have been bending or manipulating the computational and other types of evidence to reaffirm or to prove their hypothesis of hundreds of "authors" acting during this period. In contrast, my research has undergone the overwhelmingly more difficult task of dismantling these past conclusions by finding a mountain of evidence for an entirely different interpretation. No previous researchers have tested 284 texts from this period, so none have done as much to test the accuracy of their conclusions than me.

900faktorovich
Apr 18, 2022, 1:11 pm

>893 MrAndrew: Denying the reality of a person because she counters your understanding of "piqued" and "peaked" is uniquely delusional-thinking. What about coming back down to earth and examining the evidence on the GitHub page I have cited, and realizing that there is nothing unreal about the enormous quantity of clearly presented data there (including pictures of handwritings for those less interested in reading stuff).

901faktorovich
Apr 18, 2022, 1:24 pm

>896 anglemark: Explaining the scientific method is a standard part of Introductory Writing as well as advanced Research Method classes that I taught at colleges for years. If you don't know that a hypothesis is taught in such classes; you probably have never taken a basic writing class. Here is an example of a graduate class at Harvard that touches on the "scientific method": https://canvas.harvard.edu/courses/4213/assignments/syllabus. The main textbook is: Strunk, W. & White, E.B. (2009) The elements of style. New York, NY: Pearson Education. I have used this book in my writing/ research classes. One of the main components of this class is how to avoid "Plagiarism"; and this is a major component of my re-attribution of the Renaissance to ghostwriters. The "scientific method" is a broad concept that refers to: "a method of procedure that has characterized natural science since the 17th century, consisting in systematic observation, measurement, and experiment, and the formulation, testing, and modification of hypotheses." By just reading this definition, everybody has grasped the general principles of what the scientific method is. The strategies or methods that are needed to test computational-linguistic attributions are specialized processes, instead of merely needing the "scientific method" to be grasped.

I am continuing to write the BRRAM series and adding thousands of words to it daily; as I have explained the first 14 volumes were the first half, while the second half of the series is forthcoming in 8 months or so.

902Keeline
Apr 18, 2022, 2:54 pm

Since we are now back to referring to the "scientific method" (which is a lot more specific than Strunk & White from The Elements of Style which is obviously a writing advice book and not a scientific book) after 900 replies, I would like to know how this project started.

That is to say, why were you inspired to do any textual counting and other analysis on works from this time period?

What was the goal?

When developing the analysis techniques, what sort of control was used to assess the efficacy of the measurements?

What is the ramification of the findings?

James

903Crypto-Willobie
Apr 18, 2022, 7:41 pm

Dang! I missed number 900!

904faktorovich
Apr 18, 2022, 9:18 pm

>902 Keeline: As I have mentioned before, I had previously written an around 300,000 words book in which I applied my computational-linguistic attribution method to under-80 British 18th century texts. One of these articles was published in the Journal of Information Ethics. During discussion about that article with the Journal's editor, Hauptman, he challenged me to test my method on the more complex Renaissance period. I realized that journal editors have not been taking my 18 or so essays on the 18th century seriously because it seemed they did not care about the Defoe and other literary mysteries of that time that had faded from mainstream scholarly discussion; in contrast, there are constant allusions to "Shakespeare" and new books about the re-attribution of the Renaissance that have been published across the past hundreds of years and into the present. Thus, I started testing the Renaissance, and it was indeed an extremely complex computational-linguistic challenge, because unlike in the 18th century, there were too many cross-byline matches in the Renaissance that indicated massive ghostwriting efforts under multiple pseudonyms. So, I expanded the study gradually to include more of the relevant texts that came up in my gradual research into this period and its texts, until I reached 284 texts and finally the mystery was solved with an extremely great degree of certainty, which has only been increasing in certainty since due to new evidence being found daily. During the expansion phase, I had written entire essays that had guessed other likely ghostwriters, such as "Anthony Munday", which is really a blatant pseudonym when it is alternatively spelled as, "A Monday"; I kept finding evidence to contradict these earlier attribution conclusions, until I arrived at the final six ghostwriters, and I still have not been able to find any evidence to counter these conclusions. If I had found contradictory evidence, I would have proposed other ghostwriters. I explain how I excluded all other names that I considered (thousands of different bylines) across the BRRAM series.

The goal was and is simply to determine: who wrote the British Renaissance (and why)?

If I had only used computational-linguistic analysis, I would have and have taken steps to check the statistical significance of the results to make sure that they were not random, and that they indeed must indicate groupings of linguistic similarity. I explain the statistical verifications I undertook in a chapter in Volumes 1-2 of BRRAM. And I also used several other types of analysis (handwriting, financial documents, confessions in autobiographies/letters) to arrive at the same attribution conclusions by entirely different pathways.

You are thinking of "efficacy of the measurements" as the required application of "Randomized controlled trials" in medical experiments, such as, https://pubmed.ncbi.nlm.nih.gov/7723466/. In this case the use of "nonrandomized studies" required the added use of "statistical adjustments such as matching or covariance analysis to adjust for inequalities or to remove biases between the treatment and control groups". While these seem like convoluted terms, they just repeat that the data is edited to account for biases that might have influenced the results. Such biases exist when people are overweight or extremely ill and thus data about them (if unedited) could skew how well a given drug works. In contrast, computational-linguistic analysis does not measure people with the complexities of human organs/health etc., but is merely measuring the quantifiable letters, words, phrases, sentences, paragraphs and books. Of course, I accounted for spelling variations in Early Modern English vs. Modern English, by testing both types of the same texts, and otherwise worked to clean up the files to delete all possible corruptions of the language. By using a combination of 27 different types of tests, I assured that a match indicated several similar elements (and thus not merely a chance occurrence that might have been statistically insignificant). And by combining them in a simple binary formula, I avoided the various types of bias that would go into interpreting the data (where most previous computational-linguists have erred as we have seen earlier in this discussion). "Efficacy" means "the ability to produce a desired or intended result"; well, when it comes to the re-attribution of texts, it is best not to have any "desired or intended result". Having such desires for "intended results" have led previous computational-linguists to conclude that if a study confirmed a stated byline-attribution this proved to be an efficacious conclusion; when it merely meant that not enough texts across enough bylines had been tested to check if the attribution would remain true in this larger corpus. I started this study with the goal to learn who actually wrote these texts, and without any desire or motive either to confirm or to discredit previous attribution findings.

The "ramification" or consequence of my findings has been close to zero because they have not been "taken seriously" in academia, as you guys have been repeating across this thread. In an ideal world, the consequence of such history-changing findings would be the change of the attributions in the history books and texts' bylines.

905MrAndrew
Apr 19, 2022, 5:32 am

>894 paradoxosalpha: that's exactly what rogue AI would say.
>897 faktorovich: that's exactly what rogue AI would say.
>900 faktorovich: that's exactly what rogue AI would say.
>903 Crypto-Willobie: that's exactly what drunk AI would say.
>905 MrAndrew: that's exactly what rogue AI in an infinite loop would say.

906paradoxosalpha
Apr 19, 2022, 9:26 am

>905 MrAndrew:
Yes, the internet is largely forged by AI--a far more parsimonious explanation than the notion of human "posters" promoted by the intellectual Establishment.

907faktorovich
Apr 19, 2022, 11:18 am

>906 paradoxosalpha: That is not a conspiracy theory, but our reality again. I could not find any studies on the percent of social media followers that are likely to be purchased vs. authentic people. I did find this article: https://www.cnbc.com/2019/07/24/fake-followers-in-influencer-marketing-will-cost.... It explains that companies spent 1.3 billion giving money to "influencers" with fake followers just related to product marketing, so there are billions more wasted on many other sides of this problem. People just don't care enough about other people, and certainly not about celebrities who are not part of their real lives to follow folks online like Cristiano Ronaldo who has .5 billion followers. Without checking who this person is, can you guess what he does? My guesses were acting or business, and both were wrong. Fandom in reality has been on a decline together with the precipitous drop in the quality of the arts (dropping to 5 million record sales for the top-selling album of the year in 2021 during a pandemic, when nobody could go outside to a concert, from the 70s when 40 million records sold was a regular annual occurrence). Amidst this sea of problems, one I notice in your point is the phrase "intellectual Establishment" and it also draws my attention to the phrase "artificial intelligence"; I think both of these phrases can be corrected to be more truthful if you take out "intelligence" and just leave an artificial Establishment that is a kind of a blob that is acting with goals like "want more money for least work", and "want all the money", and "want everybody to be stupid enough to think I'm smart." I hope this helps you think through all this.

908susanbooks
Modificato: Apr 19, 2022, 2:22 pm

>896 anglemark: "though I don't know if any part of her (faktorovich's)* methodology would be accepted by literary scholars either"

It would not.

*(substituting parens for brackets since LT Talk doesn't read the latter)

909Keeline
Apr 19, 2022, 12:49 pm

>904 faktorovich: ,

It seems to me that if you are trying an analysis technique that is new and is not proven (to others), it is unwise don't go from a rather-complex-case to an impossibly-complex-case. You would want to make sure that the technique is well understood and works reliably for very controlled circumstances. And when I say "works" I don't mean that it reliably says that the color of the clear sky on Earth is bright yellow at noon.

One approach would be to gather a number of people (maybe 10, perhaps more) and have them type in two or three writing samples from each in a proctored manner so it is not pasted in from another source. These are submitted to a third party who retains the names or an identifying mark of the typist. The sample texts are provided to you for your analysis without any names or identifying marks. Then you perform your analysis. When this is done, then and only then, are the numbers of the authors revealed to you. This is a technique that would be analogous to the scientific method that have been mentioned multiple times in this thread.

The method you describe is like someone just learning to swim and then jumping into turbulent shark-infested waters. You say that the people who were looking at it for British 18th century texts had doubts. To my mind, that is not the time to jump to the 16th century. You have to convince not only yourself but others.

Part of that is understanding and clearly explaining how each test is relevant before combining them with others which would tend to hide differences rather than reveal them.

James

910Keeline
Apr 19, 2022, 1:06 pm

>907 faktorovich:

Having people cheat the "influencer" system is a byproduct of paying for people to promote a product because they have a bunch of "followers." A larger audience is going to get more pay so there is an incentive to create the appearance of a larger than real audience.

This is like designing tax codes. Most of the time when you tax something at a higher rate, you get less of that activity reported as it is natural for people to want to minimize their tax obligation.

We see this on eBay now. They have different rates for different categories of items. They set the rates based on what they want to see more of in listings. 20+ years ago when I started using eBay, they billed themselves as "the world's largest garage sale" and many interesting and hard-to-find items could be found there. However, as they had investors and wanted to maximize profits, they sought ways to expand their market into retail areas dominated by other entities. So they started going after less used material and more new material. The last time I checked, books (which interest many of us here) had the highest rate of fees. An example of the lowest rate was for athletic shoes over US$100 in closing price. These are the name-brand shoes that are most likely to be counterfeited. So one gets the idea that this is what eBay wants more of and they want fewer old books sold.

When you set up a system that creates a financial incentive for certain activity, they will be greedy cheaters who try to maximize that and "game the system."

But, this is not all of society or even a majority of it. Sure it may be larger than we'd like but it is an indication of an unethical or even criminal element. If there are no consequences for the bad actions, we'll get more of it.

There is a notion of "unintended consequences" where the result is not always part of the stated goal. But some are devious enough to set up a set of rules with a stated goal but another one under the surface. If that is true, maybe it's not so "unintended" after all but just "unstated" consequences.

This is all rather tangential to the main topic of the thread though.

James

911faktorovich
Apr 19, 2022, 9:19 pm

>909 Keeline: You are describing the degree of impossibility of a case without having opened the book that describes the case. I explain the said case fully, so it is a possible case to prove, even if it will take me around 20 volumes to do so. Since I am already 17 volumes into it, it is now pointless to suggest that the last is impossible, since it is nearly completed.

It would be some kind of a magic trick if you were able to understand any complex technique without reading the book that describes it fully. Just as you would not attempt to understand "Newton's law of universal gravitation" from merely the formula, you cannot give up on your capacity to understand my computational-linguistic method due to its difficulty without the hard work of reading the manual that explains its elements.

Computational-linguistic attribution is also very different from stepping outside and seeing the color of the sky. The color of the sky might indeed be a bright yellow at noon if one steps out in the midst of a bomb explosion that covers the sky. The complexities of sky-color are entirely irrelevant for understanding a perfectly controlled environment of the sterile digitized text. There is a precise attribution possible by dissecting the letters, words, etc. that make up a text with an unbiased formula designed to compare texts against each other. The puzzle would only appear impossible if a researcher fails to accept the data's results for what they are, and instead brings with them biases or theories regarding what the data is supposed to show. A meteorologist should be able to guess with relative certainty that it should be a clear blue sky at some specific point of a day; if he goes outside at that time and finds a yellow sky or a cloudy sky; then, he has made some error in the calculation, or something is blocking his view. In contrast, a computational-linguist has already made the worst possible mistake if he has guessed before looking at the data which texts must prove to be by a single author (and by which specific author), and this mistake is aggravated if he then "adjusts" the data or shrinks the corpus to arrive at their own biased conclusion instead of recording exactly what the unmanipulated data is saying about the attributions of the texts.

Your suggested experiment would require extremely rigorous oversight and controls, as it has to include monitoring all writers' progress to avoid all hints of plagiarism, cheating etc., or the results would be unusable. And it cannot be done on only 10 people, but on a statistically significant set of at least 100 people. All of these participants would have to be paid for their time, and large computer classroom has to be rented. The cost will total at least $50,000 with the very minimum elements you are proposing. It is thus an absolutely ridiculous waste of money, when there are so many very different books that I have already tested from the centuries that confirm the accuracy of my method.

The method has to fit the experiment. Imagine if I was testing the influence of vegan food on the blood pressure by feeding folks in a lab and regularly measuring their blood pressure. Now, you came along after the experiment is finished and told me that I had to also give all of these people a written exam where they described if they feel nervous while their blood pressure is being tested. You are proposing an entirely different and unrelated experiment that you are free to attempt, but does not answer any research questions I asked, nor is necessary to answer the questions that I did ask.

Testing a corpus of texts through a series of linguistic measures and then comparing them in similarity to each other has been the basic elements of the standard scientific method employed in this field. I have never heard of a written exam of the type you are proposing. Because this sort of test would be needed to test current students for their propensity to plagiarize, and not book-length works to solve their attribution mysteries. Just because you have come up with a different method of science, does not mean you have a higher understanding of what to you seems like a mysterious "scientific method". Choosing the exactly correct (and the simpler the better) method of scientific testing is how correct results are attained. You have to ask, what is the research question? For BRRAM, it is: who wrote the texts of the Renaissance? The method has to arrive at an answer, and not veer into an entirely different question of: are humans capable of original writing/ thought/ non-plagiarism?

I have jumped into the 16-17th centuries over 2 years ago, so I've been exploring this field for a very long time now. I had spent over a year on the 18th century before that. I have been trying to point out that neither you guys nor any of the journal reviewers who rejected my work have actually read my research. No, the problem is not that they did not "believe" it, but that they failed to attempt to comprehend it.

If you still don't understand why punctuation-frequency or any of the other tests I used are relevant; you have to ask a specific question, as I have no idea how they can be irrelevant from your perspective. The combination of the tests is just a mathematical tool to turn different measurements into a comparable binary system. This combination does not change the complexity of the individual test results that remain available for researchers' access on GitHub.

912faktorovich
Apr 19, 2022, 9:43 pm

>910 Keeline: Thus, previously there was incentive for "celebrities" to create ground-breaking music videos, or to invent new dance moves, or even to create new sub-genres of music to prove their "influence" or worth over competitors to win advertising dollars. And now, purchasing "followers" is sufficient to make money from advertisers, and one can be creating absolutely nothing of any worth than a transaction between one un-aware businessman and one fraudulent "influencer".

I did not propose policing these frauds (though an equivalent fraud in the "real" world would be equivalent to thousands of counts of identity fraud). I simply pointed out that it "bots" (and ghostwriters) are not a topic for conspiracists, but rather a very massive portion of the garbage dump that makes up most of the internet.

I have purchased items that turned out to be counterfeit and even extremely dangerous on both EBay and Amazon. One that stands out is the bug-zapper that was just standing on the counter, when it suddenly exploded (not too massive, but there was a bang) and caught on fire, so that I had to carry the thing out of the house before the rest of my house caught on fire. And then Amazon asked me how I was going to mail it back to them, to which I had to ask if they seriously wanted me to mail a bomb back to them through the mail to get a refund, and they agreed that this was not necessary and issued a refund without it. So the internet is a garbage-fire, and it's a pretty dangerous one.

I cannot imagine who wants to make money from exploding bug-zappers, but maybe there should be minimum regulations in place so that all of us do not regularly receive such products of Capitalism.

Every time a legislator in the US sets out to "fix" the system, they introduce a new loophole only for their friends, or they dismantle an entirely new barrier to fraud while appearing to fix a problem. A simple comparison to see why the US has the worst regulatory system on the planet is here: https://www.marthastewart.com/2225508/beauty-ingredients-banned-united-kingdom-u.... These 1,300 products have already been proven by the EU to be dangerous, and the US is so corrupt that even European scientists can budge the completely-inactive US regulators to notice their hair is potentially burning... or the like. Inflation does not have to run away into infinity if there was actually a competent economist (or a group) at the top, and if those at the top were not there merely on the cost of paper-degrees, and ghostwritten-books. Plagiarism and ghostwriting is at the foundation of all problems in the world that need (but cannot access) an intellectual answer.

913susanbooks
Apr 20, 2022, 9:42 am

>912 faktorovich: You purchased a bug zapper (which is environmentally dangerous, killing, as it does, the good bugs that prey on mosquitoes and keep an ecosystem healthy) that didn't work, therefore "the internet is a garbage fire."

If that's not sound logic, I don't know what is.

In graduate school I got food poisoning from eating at the school cafe. Thus, food, universities, and cafes are pernicious and must be suppressed.

914spiphany
Apr 20, 2022, 11:34 am

>911 faktorovich:
There is a precise attribution possible by dissecting the letters, words, etc. that make up a text with an unbiased formula designed to compare texts against each other. The puzzle would only appear impossible if a researcher fails to accept the data's results for what they are, and instead brings with them biases or theories regarding what the data is supposed to show.

Except that there is no such thing as scientific research that doesn't rely on some set of assumptions or another (i.e., "bias"). A framework if you will.

Scientific experiments are designed in such a way as to try to minimize the effects of whatever assumptions are inevitably brought into the proceedings (this is why there are things like controls, blinding, H0 and H1, etc.), but that doesn't mean they produce some mythical form of pure, unbiased data.

Just because you are using a measurement that takes the form of numbers rather than something subjective like character development doesn't mean that there aren't choices being made in what is measured and how it is measured. It doesn't mean that the results don't require interpretation.

Because you are indeed interpreting your results. Your conclusions are not dependent on previous assumptions about authorship, that's all. What your numbers tell you is how similar or different the texts are, according to certain measures. Everything else is your interpretation as to what the results mean.

915paradoxosalpha
Modificato: Apr 20, 2022, 11:56 am

Strawman argumentation by >912 faktorovich: supposes someone here suggested bots aren't real, rather than the actual jokes at conspiracist ideation: bots have generated most of the content in this thread, including that attributed to Dr. Faktorovich (per >905 MrAndrew:), and humans aren't real (per >906 paradoxosalpha:).

916faktorovich
Apr 20, 2022, 1:18 pm

>913 susanbooks: I am vegan, so I agree that ideally I would not have to kill a single bug. But if bugs fly into my house, they are a potential health hazard as they can spread diseases, so I have to have a method for catching them. I have switched to sticky-tape traps that attract bugs with light, and have not had a similar exploding-bug-zapper experience since. To blame the victim of an exploding bug-zapper instead of the fraudulent manufacturer is a special type of low-point in moral degradation. It is a common strategy used by most manufacturers of faulty goods, as well as by rapists - both of these groups victim-blame.

But you are adding the trick that not only am I to blame for attempting to file a complaint regarding a potentially deadly product, but also you are inventing fictions about what you imagine I am saying, such as that I am campaigning to suppress all bug-zapping-manufacturing and all of the internet. Then you are using an irrelevant analogy to strengthen this point. If somebody is hurt by an exploding bug-zapper or by the food in a café; the rational solution (I support) is creating legislation/ rules/ policies or the like that prevents manufacturers from wanting to (via fines) or being able to (via regulatory testing to check products) to sell faulty goods; and for cafes to perform regular testing for food-borne illnesses and discarding of spoiled items. Most of the groceries at my local Walmart/ Quanah general store are regularly spoiled, and I have to search through the shelves to find a few items that have not already rotten. Are you saying you want to be served rotten food in cafes, and to dig through files of spoiled food at the grocery store, and to have exploding bug-zappers or other appliances just to protect these corporations' rights to cause harm without being "suppressed"?

917faktorovich
Apr 20, 2022, 1:50 pm

>914 spiphany: As I stated previously, some types of experiments require far more or other types of minimization of "assumptions" and "biases" in their procedures than others. Let's take "blinding" for an example. Let's say you are testing the impact of hip replacements on the health of the patient. There have been some experimenters who have performed fake surgeries in such cases without telling patients if they had the real or fake procedure done on their bones. In this case the patients are blind to knowing if the procedure happened or not. In such experiments, scientists learned that not doing the procedure and doing the procedure showed almost no statistical difference, so that a particular type of surgery on bones was proven to be ineffective because issues either healed on their own, or the pain etc. continued even after the surgery. In this case, it is indeed absolutely necessary to test the procedure with a blind study to determine if the intervention is just a placebo pill (and simply helps because the patient imagines it does). In contrast, there is nothing imaginary or non-measurable about a text. A text cannot suffer from a placebo effect. A text does not mutate into something else in the middle of an experiment because it has changed its diet, and it does not sneak out to have cheat meals. A text is a set of non-changing letters and blank spaces. There is nothing that the text needs to be blinded to because the text is not a living organism and so it will not change, and it is naturally entirely blind and non-perceiving. A text will only change if the researcher or past editors or the like have changed it. I have been very careful to only take out the parts of the text that are in other bylines (such as an editor's introduction) or otherwise to clean up the text to take out corrupting markers such as random dots inserted in a bad transcription. I was also careful to create the least invasive types of tests that approach as close as possible to being "some mythical form of pure, unbiased data". Only I have avoided mythology, and instead allowed the pure mathematics give the quantitatively rational answer.

As for the interpretation of the data, I have added several non-quantitative tests to check if the purely numeric answers match these non-quantitative conclusions. For example, I looked at structural elements such as character type/ plot repetitions between "Shakespeare" texts - there is a table on GitHub with this data.

How similar and different texts are is all a computational-linguistic attributor has to know to conclude the quantitative part of the test. Basically, there is a box of sticks of different length, and the computational analyst sorts these out into groups by similar sizes. There can be 284 sticks of 284 different sizes, or all sticks can be the same identical size. Texts are not sticks and that's why a set of many tests have to be applied to measure their dimensions. Once this step of sorting is finished, now the interpretation stage does indeed begin. Then, the interpreter's task is to research hundreds or thousands of potential bylines within these groups to determine who among them is the most like ghostwriter working under multiple pseudonyms. If you have a pool of linguistically matching texts with multiple bylines in it, the existence of a multi-byline ghostwriter is the only mathematically possible interpretation. Reaching this conclusion is not a bias, but rather a mathematic fact. As long as biographies, publication dates, and other elements are then interpreted fully, the researcher will eventually arrive at the one ghostwriter who could have done it, while all others could not have for each group; alternatively, the ghostwriter might not be in the initial corpus, and the corpus might need to be enlarged to find alternative potential ghostwriters, as I did several times in my study before arriving at an attribution conclusion where I no longer had any doubts. In contrast, other computational-linguists have manipulated data so that it gave them results that reinforce the existence of a single "Author" such as "Shakespeare" (because that's what academic publishers pay them for), despite the true data saying there were five ghostwriters creating different texts under this byline.

918susanbooks
Modificato: Apr 20, 2022, 4:11 pm

>916 faktorovich: I, too, am a vegan. If you're zapping bugs you're not exactly getting the whole vegan thing.

I mocked you for concluding the "internet is a garbage fire" bc you bought a faulty, unethical, environmentally dangerous device and Amazon only offered you your money back (rather than, what, baking you a casserole?). I don't care about your bug zapper. I was mocking your logic. Again, careful reading is a necessary skill for literary scholars, one you demonstrably lack.

As for your bug-infested home where disease is being madly spread from surface to surface, put some screens in your windows.

919paradoxosalpha
Apr 20, 2022, 5:56 pm

I've lived in Texas. The bugs take no prisoners.

920faktorovich
Apr 20, 2022, 9:24 pm

>919 paradoxosalpha: I do not use any pesticides or herbicides on my lawn, so both wild flowers and wild bugs just tend to happen. A vegan is "a person who does not eat any food derived from animals and who typically does not use other animal products." Thus, even if I ate bugs, I would still be technically vegan, as bugs are not animals. By going down the don't-zap-bugs argument you are bringing up the common "combine-harvesters kill animals indirectly while plowing the fields to harvest plants" argument; it's over-used and does not statistically make sense. There are close to zero bugs in my tiny house without me using any insecticides indoors either. In a typical year, I might use a single bug-catching strip to catch at most 15 mostly tiny flies. I saw a lot more bugs of all sorts in all of the places I rented before buying my own house. It's curious that you are doubling-down on insisting that the person who purchases a "dangerous" device is responsible for not having foreseen it was explosive beforehand, whereas the manufacturer is saintly for offering a refund even when the explosive device in question cannot be legally sent in its active state through the mail. "Mockery" is a satirical imitation, whereas an insult is a deliberately hurtful speech. If you are making a direct, false statement that my house is "bug-infested home where disease is being madly spread from surface to surface", you are not imitating anything, but rather making a direct libelous claim or insult.

921lilithcat
Apr 20, 2022, 9:37 pm

>920 faktorovich:

bugs are not animals

Of course they are.

Britannica: insect, (class Insecta or Hexapoda), any member of the largest class of the phylum Arthropoda, which is itself the largest of the animal phyla.

922prosfilaes
Modificato: Apr 20, 2022, 10:06 pm

>917 faktorovich: There can be 284 sticks of 284 different sizes, or all sticks can be the same identical size. Texts are not sticks and that's why a set of many tests have to be applied to measure their dimensions. Once this step of sorting is finished, now the interpretation stage does indeed begin. Then, the interpreter's task is to research hundreds or thousands of potential bylines within these groups to determine who among them is the most like ghostwriter working under multiple pseudonyms. If you have a pool of linguistically matching texts with multiple bylines in it, the existence of a multi-byline ghostwriter is the only mathematically possible interpretation. Reaching this conclusion is not a bias, but rather a mathematic fact.

There are no two sticks exactly the same size, at least not sticks long enough to be seen with the naked eye. Anymore than you have a linguistically exactly matching pair of texts. That's part of the complaint, that your idea of close enough is pure bias unsupported by evidence. There could be no ghostwriters working under multiple pseudonyms, but scores of authors working under communal names. This was all illegal, according to you, so whenever the police were going to raid the publishers, the authors' bodies would end up in the Thames; dead men tell no tales. Each and every Shakespearean play was written by someone different.

More over, what tests, and how are they measured? How are you sure that the tests you're using would group all of August Derleth's works together and separate from HP Lovecraft? I don't know if it's possible; Derleth's Wisconsin material differs from his Cthulhu mythos works quite a bit, and his Solar Pons works are distinct from both of them. You haven't show that it would do anything correctly on a known collection.

Mathematical facts and reality have a complex relationship. Ask Newton, whose theory of gravity was "mathematical fact" until Einstein started poking holes in it. Ask Kant, who pointed out that Euclidean geometry was obviously true and no alternative could be comprehended, until Boylai and Lobachevsky gave us an alternative and Riemann proved that both Euclidean and hyperbolic geometry were too simple for reality, which is current described by Riemann geometry. Way too many AIs trained to recognize objects have looked at a black human and said "gorilla", because thinking that your mathematical algorithms are right isn't good enough unless you check it with the right data.

923paradoxosalpha
Apr 21, 2022, 12:43 am

>920 faktorovich:

I don't think you were actually replying to me there. But, yeah, bugs are animals. Impressive and often hostile ones during my time in Texas. As little as I liked them then, I am worried about bug-free windshield syndrome.

>922 prosfilaes:

Careful. She'll look at Derleth's work and insist that it is actually the product of multiple authors. Scientific and irrefutable.

924Matke
Apr 21, 2022, 10:06 am

>920 faktorovich: Bugs are not animals

Indeed? What do you suggest they are then? Vegetable? Mineral? Some form of algae?

It took 920 posts, but you’ve made a statement which no amount of your verbal antics can justify. It’s just plain wrong. Demonstrably wrong.

Just like so many of your other statements, which are admittedly more complex, but just as false no matter how you satisfy yourself with your windy and faulty logic.

925faktorovich
Apr 21, 2022, 1:14 pm

>921 lilithcat: I do not eat honey or insects, so I am still vegan even by this stricter definition. Eating honey and insect-killing are points that are not disqualifiers for being vegan, as this broad definition would make it impossible for any human to be vegan. The Animalia kingdom also includes Sponges, so natural sponges would be a violation; and the Bryozoa are so small that you can swim in the sea and some might die from you taking a few strokes through the water. Since I don't swim and don't use natural sponges, I am still technically vegan even by these counts.

926anglemark
Modificato: Apr 21, 2022, 1:56 pm

>924 Matke: Give her some more credit than that! She's made demonstrably wrong statements all the way through this thread; probably starting in >19 faktorovich: but definitely from >20 faktorovich: onwards. Recently she claimed that Strunk & White's style guidelines is a textbook about the scientific method! (I notice that she has now admitted that she never tried to make her "computational-linguistic" study statistically verifiable, almost five months after the heroic attempts made by various people to explain to her what she would absolutely have to do to get actual results from her research.)

That being said, there is one archaic taxonomy, possibly based on something in the Bible, that defines insects as non-animals. The introductory linguistics textbook The Study of Language has a really weird semantic relations tree where the author (George Yule) places "living thing" at the top, with "creature" and "plant" below, and "animal" and "insect" below "creature". This is something my colleagues and I discovered when we were marking an undergraduate linguistics exam and were completely baffled by the number of students who had not been able to identify "insect" as a hyponym of "animal". A British colleague then mentioned that he had heard that classification before, and he thought it was vaguely Biblical. (It's bonkers of course, and as the daughter of an entomologist it hurt to have to mark that answer as correct... but it was introduction to linguistics and not to biology, so we had to content ourselves with putting a note in the margin of the exam saying "Yule is Wrong about this").

>908 susanbooks: Thanks! I was pretty certain that was the case, but it's good to have it confirmed.

-Linnéa

927lilithcat
Apr 21, 2022, 1:56 pm

>925 faktorovich:

I never said you weren't vegan. I was simply pointing out the error in your statement that "bugs are not animals".

928faktorovich
Apr 21, 2022, 1:58 pm

>922 prosfilaes: Objecting that no two sticks are exactly the same size is absurd because if you are counting the millimeters in all measures, and if there is a divergence of a millimeter you are disqualifying all similarity; then, all science would come to a halt as no precise statistical comparison would be possible.

I have indeed found some extremely similar texts in the 18th century that match on most of the tests and indicate a professional author creating similar novels in the same genre. There are also texts that are co-written or written by multiple authors and they show a smaller degree of similarity to each of the authors because the similarity is split between them; these multi-authored texts happen to have strongest matches to other texts written by this particular grouping of authors, so identification is still possible.

There are many previous computational-linguistic studies that have attributed anonymous or otherwise bylined texts to other bylines based on finding similarity between texts. So there is nothing new or unprecedented about such analysis. My analysis is simply truly unbiased, as I have no motive to reinforce or contradict past attributions, whereas insiders paid by mainstream publishers of these Renaissance texts are financially invested in reinforcing existing bylines.

"There could be no ghostwriters working under multiple pseudonyms, but scores of authors working under communal names." Imagine there is a box of crayons: red, blue and green. I am stating that each color represents a different ghostwriter, working under different pseudonyms that are written randomly on one or more of the individual crayons. What are you saying? 1. You can be saying that the linguistic distinctions between the "colors" are non-existent, and also that the "authors" are using names completely randomly or communally without any identifiable pattern. 2. Or are you saying that there are still three "authors" working under "communal" names written on their different crayons. By saying the latter are you simply refusing to label these same entities as "ghostwriters" and "pseudonyms"?

"This was all illegal, according to you, so whenever the police were going to raid the publishers, the authors' bodies would end up in the Thames; dead men tell no tales. Each and every Shakespearean play was written by someone different." What? I am not saying anything of the sort. I am saying that all 6 ghostwriters got away with this scheme and lived to an old and wealthy age. I am also saying that there were several "printers", booksellers and "authors" who were raided, imprisoned, executed and otherwise legally harassed by the State; they were not dumped in the Thames by mobsters, but rather legally dumped in the Tower by the legal system. I am saying that these ghostwriters ghostwrote the Laws of Britain, and its legal opinions, and its radical theological pamphlets, and pamphlets decrying radical pamphlets, etc., etc. They had the power of the State because the rest of the state preferred hiring ghostwriters to investing any intellectual effort to running the details of government. The GitHub file specifies which "Shakespeare" plays were ghostwritten by which of the ghostwriters - most of the "Shakespeare" comedies were ghostwritten by Jonson, while most of the tragedies were by Percy.

"More over, what tests, and how are they measured?" I already answered this question: "27 tests measured: punctuation, lexical density, parts of speech, passive voice, characters and syllables per word, psychological word-choice, and patterns of the top-6 words and letters." Please clarify if you are asking me to specify something about these tests.

"How are you sure that the tests you're using would group all of August Derleth's works together and separate from HP Lovecraft? I don't know if it's possible; Derleth's Wisconsin material differs from his Cthulhu mythos works quite a bit, and his Solar Pons works are distinct from both of them. You haven't show that it would do anything correctly on a known collection." A few of you guys have asked me to check specific authors and texts to determine if the method would work. When you first asked, I offered to test any set of texts and upon request I performed the "Lunch Test": https://github.com/faktorovich/Attribution/blob/master/LibraryThing%20-%20Lunch%... The specific question you are asking about here should be framed as a mystery, but you are asking as if the authorship of these texts is a known fact. If you were presenting a box of crayons, you might as well have said, Derleth's "Cthulhu" labeled crayons seem bluish, and HP Lovecraft's science fiction/ fantasy crayons are also bluish; but Derleth's "Wisconsin" crayons are greenish, and Derleth's "Solar Pons" crayons are reddish. Then, you are telling me that if my computational-linguistic method was correct, it would label the crayons you intuitively think are three different colors but are labeled as "Derleth" on their sides as indeed being the same color that confirms the byline; and you are saying that Lovecraft's crayons would have to be decided to be any other color from Derleth's for the attribution to be accurate. This is an entirely irrational and entirely biased approach. An attribution method cannot be tested in correctness by the number of bylines it confirms match the actual bylines, but rather it has to reveal the true attributions that might be extremely different from the byline assigned to any corpus of texts. This is why I did not stop at the computational method, but kept digging to find a 17+ books of separate evidence to confirm my attributions for the Renaissance. The research I have done would be equivalent to if I had complete access to every piece of handwriting/ books etc. that Derleth and Lovecraft ever wrote, alongside with their tax records, receipts, library collections, business contracts, and pretty much all other documentations of their personal and professional lives. After looking through all of these materials, I would be telling you the precise groupings and attributions of the texts (and you apparently would be raising the objections you are raising).

Yes, indeed, and I have taken the next step in computational-linguistics, and combined the pure mathematics with an exhaustive interpretive analysis and research. You, in contrast, are staying back on the shoreline and yelling that a trip out there where I am going will lead to the End of the Known World.

929abbottthomas
Apr 21, 2022, 2:44 pm

>928 faktorovich:. So your “analysis is simply truly unbiased, because (you) have no motive to reinforce or contradict past attributions…..”
Are you not motivated by your need to justify your weird ideas?

And, while I am at it, your busy ghostwriters wrote all the Laws of Britain and its legal opinions, did they? What were all those judges beavering away in courts of first instance and appellate courts delivering their judgements and establishing case law doing? Their activities are well and reliably recorded as far as I am aware. Are you asking us to believe your guys made it all up?

Do you know Vladimir Putin?

930faktorovich
Modificato: Apr 21, 2022, 2:53 pm

>926 anglemark: "Veganism" is "a philosophy and way of living which seeks to exclude – as far as is possible and practical – all forms of exploitation of, and cruelty to, animals for food, clothing or any other purpose; and by extension, promotes the development and use of animal-free alternatives for the benefit of humans, animals and the environment. In dietary terms, it denotes the practice of dispensing with all products derived wholly or partly from animals’." --Donald Watson, 1944, coiner of the term "vegan"

Each word in the dictionary is a construction that some body defined and classified and set the limits for. And there are many words with political weight that have multiple conflicting definitions that were written by people with conflicting interests, or perspectives. Thus, "Animalia" should be used as a separate term from the conversational term "animal". In 1902, the definition for "Animalia" was introduced as: "one of the basic groups of living things that comprises either all the animals or all the multicellular animals". Flies are protostomes, which are part of the animal kingdom without backbones (whereas humans and other beasts conversationally classified as "animals" have backbones, and are thus called chordates). Any living creature that can move on its own is part of the Animalia kingdom. The problem with this classification is that all plants move as they bloom, and stretch and crawl and even can communicate basic signals about their environment. Plants also reproduce, and perform many other functions that distinguish "Animalia" organisms from the Plant kingdom. Carnivorous plants even eat Animalia. It is not easy either to defend or to refute the claim that an insect is an animal, because most people have a creature with a spine in mind when they use the generic term "animal". If you were surprised on the street yesterday and somebody asked you if a sponge was an animal, would you have said, "Yes"? The problem is the lack of scientific specificity in definitions. For example, here are some definitions for "animal" I found in https://www.collinsdictionary.com/us/dictionary/english/animal:

"An animal is a living creature such as a dog, lion, or rabbit, rather than a bird, fish, insect, or human being."
"Any such organism other than a human being, esp. a mammal or, often, any four-footed creature".
Part of the problem is that this term has a theological foundation: "living being. anima, animus, breath, air, life principle, soul".
"A mammal, as opposed to a fish, bird, etc".
In contrast here is a scientific definition: "any member of the kingdom Animalia, comprising multicellular organisms that have a well-defined shape and usually limited growth, can move voluntarily, actively acquire food and digest it internally, and have sensory and nervous systems that allow them to respond rapidly to stimuli: some classification schemes also include protozoa and certain other single-celled eukaryotes that have motility and animal-like nutritional modes". In this definition "protozoa" (including flies) are singled out as very different from the norm, so that they require specific mention.

So, if you are going to correct my simple "insects are not animals" statement; you have to also edit all of these different dictionaries that equate "animals" with "mammals" or distinguish it as being all non-human living creatures. Watson's definition of veganism is philosophical, and not scientific; it is a moral belief system, and not a scientific cataloging of what precisely those who label themselves as "vegan" are allowed to it. In my previous house in Brownsville, TX, I had a scorpion infestation (in a very nice, year-old single-family house on a foundation), so that I had to kill at least one scorpion every week to avoid being stung and killed in my sleep. While flies are obviously not as dangerous as scorpions, it is equally rational to catch flies and to minimize the amount of potentially disease-carrying flies in a home for one's health. The point of the vegan philosophy is reduction of harm to all, and thus living in a truly scorpion and fly-infested house would be very anti-vegan.

931lilithcat
Apr 21, 2022, 2:59 pm

>930 faktorovich:

you have to also edit all of these different dictionaries that equate "animals" with "mammals"

If that's what they say, they ought to be edited! Are chickens and other birds not "animals"? What about reptiles?

932faktorovich
Apr 21, 2022, 3:04 pm

>929 abbottthomas: I am motivated to find evidence to determine the true attribution of the texts of the Renaissance. My ideas are based on the evidence I have found, and not the other way around. I did not finalize the attributions until all pieces of the evidence agreed on what the attributions were. In contrast, previous computational-linguists have judged it to be an error if the attributions were anything other than the original bylines on the tested texts.

You are imagining "beavering... judges". If you looked at the evidence, you would be looking at legal documents written in a similar handwriting across the Renaissance on all sides of the debates. So if you object there were many judges doing original legal research during this period; go into the archives and find 10 different handwriting styles in 10 differently named judges' work with a signature of each of these judges on these pieces of evidence. If you can't find such evidence; then, stop imagining what happened 400 years ago, and consider that reality might be very different from the legends spun in history books.

I do not know Vladimir Putin, nor any other Russians. I live like a hermit in a tiny house in Texas. I go shopping once per month. I do not have any friends, and have never had any significant romantic relationships, as I am anti-sexual. Do you have any other questions?

933faktorovich
Apr 21, 2022, 3:05 pm

>931 lilithcat: Fantastic. Go ahead and write to these dictionaries' editors, and recommend edits.

934Keeline
Apr 21, 2022, 3:32 pm

Look at the full page for the https://www.collinsdictionary.com/us/dictionary/english/animal link again. It does not equate "animal" with "mammal" but says that it covers a wide range of organisms with certain traits. At one point it gives mammals as an example with the abbreviation "esp. mammals" which the reader is expected to understand to mean "especially mammals" but not exclusively mammals.

This is a diversion from the main topic.

James

935abbottthomas
Apr 21, 2022, 4:23 pm

932> I must apologise for my stupid, rude and irrelevant final question above. Sorry.

I still have problems with the judges. You know that much of English law is based on case law. The judges involved were hearing real cases and making rulings rather than creating something de novo. Where did your ghostwriters get involved in this process?

936anglemark
Apr 21, 2022, 4:23 pm

>930 faktorovich: You seem to have added your response to the wrong post. I don't know which post you intended to reply to, but it can't be post 926 – there is no mention of veganism nor of dictionary definitions of the word "animal" there.

937anglemark
Apr 21, 2022, 5:11 pm

>935 abbottthomas: This is interesting. I don't know very much about this – that is, I know that court proceedings were recorded by scribes who also took depositions (these are some of the best sources that exist for spoken English before the age of recorded sound). Would their transcripts constitute the actual text of the law as well, such that the precedents that judges consulted for their rulings were found in transcripts of earlier cases?
-Linnéa

938faktorovich
Apr 21, 2022, 8:37 pm

>934 Keeline: Yes, there are many contradictory definitions on the Collins Dictionary and across most dictionaries. That's my point. If you are going to object to my use of a word based on a dictionary definition; then, you have to be prepared that I can counter with equal rationality with a quote from a dictionary that offers a contrary definition.

939faktorovich
Apr 21, 2022, 8:46 pm

>935 abbottthomas: To understand, imagine the Supreme Court of the US (or a fictional country) was made up of complete incompetents who paid for their paper-degrees, publications etc., and bribed their way into these positions in order to capitalize on favors/ bribes they could get for voting certain ways on decisions. Imagine there was no public education system in place yet, so these judges can be entirely illiterate, and might barely understand spoken English/ Latin (as they might speak German/ French or a heavy variant of English). So these judges can only barely read an execution order, or submit an opinion written by a ghostwriter for them. Most judicial cases would be decided in those days without written opinions. The major courts would pay clerks to write up records to reflect decisions, and if these were also illiterate they might have sub-contracted ghostwriters to perform basic record keeping, or record making for them. This records and case law decisions could be generated without the incompetent judges doing anything other than occasionally sitting and staring, or occasionally asking questions that were unlikely to be repeated in the "official" record of proceedings. Does this clarify what this process might have been like?

940faktorovich
Apr 21, 2022, 8:51 pm

>937 anglemark: Yes, it is a common explanation that "scribes" transcribed records and that's why so many of these records share the handwriting of a single scribe. However, my linguistic data explains that these "scribes" were not only transcribers, but also the ghostwriters of these texts. The absence of multi-handwriting preceding or original legal documents with multiple judges' signatures etc. is part of the evidence that confirms my conclusion. If you refer back to the handwriting analysis file I linked to, you should start to understand, as some legal documents are included.

941prosfilaes
Apr 21, 2022, 10:02 pm

>928 faktorovich: Objecting that no two sticks are exactly the same size is absurd because if you are counting the millimeters in all measures, and if there is a divergence of a millimeter you are disqualifying all similarity; then, all science would come to a halt as no precise statistical comparison would be possible.

I thought you took a class in statistics? The whole thing is the art of handling messy numbers and finding patterns through the noise.

Imagine there is a box of crayons: red, blue and green. I am stating that each color represents a different ghostwriter, working under different pseudonyms that are written randomly on one or more of the individual crayons. What are you saying? 1. You can be saying that the linguistic distinctions between the "colors" are non-existent, and also that the "authors" are using names completely randomly or communally without any identifiable pattern. 2. Or are you saying that there are still three "authors" working under "communal" names written on their different crayons. By saying the latter are you simply refusing to label these same entities as "ghostwriters" and "pseudonyms"?

Take Crayola's box of 64 crayons. You're saying that they're either red, blue or green at the base. I'm saying they all look slightly different, and grouping them into a certain number of colors is going to be arbitrary. Maybe there is some fundamental difference in the wax or additives that justify grouping them in less than 64 piles, but I'm going to need to see more than some random numbers.

What? I am not saying anything of the sort.

Yeah. Your failures to understand our communications do little to encourage me as to your clear understanding of the works you're translating.

I am saying that these ghostwriters ghostwrote the Laws of Britain, and its legal opinions, and its radical theological pamphlets, and pamphlets decrying radical pamphlets, etc., etc.

And I am saying that's preposterous. I'm skeptical of the literary works of the British Renaissance being written by a tiny group of writers, but it's conceivable. That the preachers and lords whose job was, in part, to produce written works all turned to those same group of writers I find beyond belief.

Laws in the modern day aren't written by legislators, but they're also not written by Dan Brown. Radical theological pamphlets are written by people who are defined by their radical theology, not their writing skill. Maybe the Archbishop didn't write the works attributed to him, but they would have been written by someone who worked for him.

An attribution method cannot be tested in correctness by the number of bylines it confirms match the actual bylines, but rather it has to reveal the true attributions that might be extremely different from the byline assigned to any corpus of texts.

What you're saying is that an attribution method can't be objectively tested, but it is correct if it matches your biases. August Derleth wrote in multiple genres. It's well documented that he did. It's well known that Lovecraft wrote certain works, and that Derleth wrote pastiches of Lovecraft's style. Separating an author from the pastiches of his works and identifying an author working in multiple styles are two tests that are important to be passed if this method is to be taken seriously, but you've rejected these tests out of hand.

942prosfilaes
Apr 21, 2022, 10:28 pm

>938 faktorovich: You demonstrate quite well an ability to quote works out of context, failure to understand words in context and willingness to argue instead of understand. Insects are animals. There are older definitions of animal that might be used in other contexts, but claiming that insects aren't animals in this context doesn't show understanding.

>939 faktorovich: To understand, imagine the Supreme Court of the US (or a fictional country) was made up of complete incompetents who paid for their paper-degrees, publications etc., and bribed their way into these positions in order to capitalize on favors/ bribes they could get for voting certain ways on decisions.

Of course. In reality, it's rare that any organization is completely made up of incompetents who let others do the work. You set up any such system, you'll get a few people who are idealistic and insistent at trying to make things better, and a few people who believe they're really that good, and write their own reports, however incompetently.

The major courts would pay clerks to write up records to reflect decisions, and if these were also illiterate they might have sub-contracted ghostwriters to perform basic record keeping, or record making for them.

That's what I always do, find someone illiterate to write up my records. And again, part of the problem is that you have six well-known authors, not a small group of nameless clerks. I'm not arguing that Jon Favreau (Obama's speechwriter) doesn't exist, and it's not inconceivable he may have farmed out some or all of the work to someone else. That someone else is not Dan Brown or any other major fiction writer.

943Keeline
Apr 22, 2022, 10:45 am

>938 faktorovich: ,

Although I know of the publisher HarperCollins (started at Harper & Brothers), I have not heard of this Collins Dictionary before. I don't recall seeing a printed one in the same way that I have heard about the Oxford English Dictionary (gold standard) and the Merriam-Webster Collegiate Dictionary (silver standard).

As already mentioned, the propensity of modern dictionaries to describe modern usage rather than prescribe well-established definitions can be a problem. However, I looked through several of the definitions in the blocks and I didn't see a lot of contradiction. There were nuances to be sure but nothing saying "animals" = "mammals", only "animals" strongly includes "mammals" in the usage. There was also nothing to say that insects were not animals.

Indeed, when the same source (Collins Dictionary) is consulted for "insects" the word animal is used repeatedly and consistently.

https://www.collinsdictionary.com/us/dictionary/english/insect

I've had only a few conversations about the dietary choices from a good friend who is a vegan. We attend the same academic conferences in cities around the U.S. and finding restaurants with foods he will eat is a bit of a puzzle at times but we usually solve it. Some cities are harder than others. Likewise another friend at these conferences won't eat fish and he can be repeatedly emphatic about that so this goes into the mix of our restaurant selection — there have to be a selection of non-fish options for him. But what I sense is that there is a spectrum of vegan (as well as vegetarian) according to what that person is comfortable consuming.

As far as the definitions of "animals", the definitions that count are the ones that scientists use, not ancient Greeks or writers of the Bible. I certainly would not use the Urban Dictionary, for example, or a book of computer terminology to get the answer to this. Any time you use a reference source it must be appropriate so that what you are looking for is in the scope of that source. If a book has biographical entries on American authors, you should not be too surprised when the Irish author you are looking for is not included.

The English Short Title Catalog covers specific periods of materials from certain sources so other things are not expected to be found there. It is important to be aware of the scope and limitations of a source to use it properly. So if a person is cataloging a Charles Dickens book and says "not in ESTC," that is not germane since the ESTC doesn't cover that time period.

Similarly, Fiction, Folklore, Fantasy & Poetry for Children, 1876-1985 (R.R. Bowker, 1986) is a great tool for getting Books in Print-style listings for older children's books. It has a specific scope of U.S. published children's fiction and poetry as noted in the title. If one is looking for something British without a U.S. publication or a nonfiction work, it would not expect to be found there. Likewise, looking for things that are older or more recent probably won't work.

For the British Early Modern period, the Stationers Register is a list of works that were registered for review by the British government censors. It is a great resource and perhaps the only one of a semi-comprehensive nature for the period it covers. But it is not the same as a record of copyrights like you might find for the U.S. But buried in the SR you can find interesting things about the evolution of imprints as the printer-husband dies and the widow or children take over the press. Even the First Folio of Shakespeare was published in 1623 by a press run by the widow. They had not yet changed the name of tie imprint.

James

944andyl
Apr 22, 2022, 11:27 am

>943 Keeline:

Collins Dictionary is a British published dictionary. They do have printed versions. For me in the UK it is OED > Chambers > Collins, but some swap Collins and Chambers around. The reason I prefer Chambers is that it is more whismical in some of its definitions for example "éclair - a cake, long in shape but short in duration"; and also seems to cover more unusual and archaic words than Collins.

945anglemark
Apr 22, 2022, 12:24 pm

>939 faktorovich: What are your sources for any of that being relevant to the English legal system in the 16th-17th century? I think it's pretty well attested that kings sometimes appointed judges who would support the king, but what are the "paper-degrees, publications etc" you mention? Why would judges at the time speak German, and what do you mean by "a heavy variant of English"? And what sources support your claim that judges were illiterate? For that matter, which courts are you talking about here?

Most judicial cases would be decided in those days without written opinions.
What do you mean by "written opinions"?

The major courts would pay clerks to write up records to reflect decisions
Yes. Yes, that's what the scribes did. They were employed to transcribe the trial proceedings – I think most courts still employ stenographers or similar. And they would also, as I said, take witness depositions, etc.

...occasionally sitting and staring, or occasionally asking questions that were unlikely to be repeated in the "official" record of proceedings.
Source for this?

>940 faktorovich: Why the scare quotes around "scribes"? A person transcribing the proceedings in the courtroom is a scribe, that's the job title, and not a "common explanation". When you say that they "were not only transcribers, but also the ghostwriters of these texts", what do you mean by "ghostwriters"? What is your understanding of what was going on in court during a trial? And when you talk about your "linguistic data", what is your corpus of law texts? Which exact documents, from which courts, written in which years – and which editions have you used? The bibliography linked from Github doesn't include any court proceedings or other legal texts as far as I can tell, but there is no genre identification and I am not about to do a close reading of a couple of hundred bibliographical entries. How did you get access to the court documents, in order to look at the handwriting?

If you refer back to the handwriting analysis file I linked to, you should start to understand, as some legal documents are included.
There was no analysis in the file, just some images of handwritten text with bibliographic details. But I'll ask about one of them, just to get some idea of what it is you have been trying to do: the second to last image is of the "Order of penance by the Court of High Commission enjoined on Richard Black of Iden, Sussex: 1594 September 19. Signed by Richard Cosin, Edward Stanhope and Richard Bancrofte." Is this one of the documents you claim was written by your hypothetical group of six ghostwriters? What led you to that conclusion? (It's listed in the "Verstegen group" section, which might mean that you think Verstegen wrote it). As for the handwriting style – is it secretary hand, or chancery hand, or some other scribal hand? (I genuinely don't know.)

-Linnéa

946anglemark
Modificato: Apr 23, 2022, 6:36 am

>943 Keeline: I can't quite agree with you about prescriptivism in dictionaries. A living language changes and evolves, there are constant semantic shifts, and a dictionary that doesn't reflect that is soon going to be useless. On the other hand, a dictionary that immediately includes every new definition that's been attested anywhere is also not very useful, and crowdsourced dictionaries like Urban Dictionary are absolutely not trustworthy! Which dictionaries would you categorise as prescriptive, for that matter? Not the Oxford English Dictionary in any case :-) (Have you read The Surgeon of Crowthorne, about the creation of the OED? Highly recommended!)

I'd argue that all dictionaries have a prescriptive function, but they should aim to describe actual usage, provided it's well-established usage and not just Neologisms of 2021. In fact, I don't think our opinions are that far apart, I just dislike the "prescriptive" label.

Back in >872 Keeline: you said
"The descriptive dictionary definitions (as compared with prescriptive dictionaries which are almost completely gone it seems) let you down when two words that sound similar have close but different meanings. This is compounded when some aspects of language are absorbed verbally rather than in writing. Those for whom English is not a first language can be challenged by these nuances and it requires extra effort to embrace them."

Hmm, I'm not sure I understand what you mean. Why would a dictionary that takes a descriptive approach not be able to define that kind of near-synonym well, and why is it more difficult for someone who learns English through speaking and listening (I guess that's what you meant?)
-Linnéa

947Keeline
Apr 22, 2022, 12:58 pm

>945 anglemark:

As for the handwriting style – is it secretary hand, or chancery hand, or some other scribal hand? (I genuinely don't know.)

Almost 20 years ago when I was interested in this topic, I found this book to be a good introduction to the secretary hand. At the time, I could read some of the documents presented in the book. I have lost that skill through lack of use in the intervening decades.

In search of Shakespeare : a reconnaissance into the poet's life and handwriting by Hamilton, Charles, 1913-1996

https://www.librarything.com/work/198907/details/64192072

A copy for online reading (with a free account on Archive.org):

https://archive.org/details/insearchofshakes0000hami

James

948Keeline
Apr 22, 2022, 1:17 pm

>946 anglemark:

When you read online and some printed content, it is common to find the wrong word used that sounds similar. Only some of these can be attributed to speech recognition systems. More often it is because people learn language by hearing words and not always seeing the same on the printed page. Thus you get confusion between simple words like "then" and "than". Some of the confusions are more extreme than this.

A poorly-edited news source that feeds local-interest news to communities in the U.S. is called "patch.com" and some of its gaffs in this area are concerning. It shows that either they don't have a good editorial process or the people in the roles are themselves prone to a large number of errors in the use of vocabulary and grammar. Sometimes I look at one of these wrong-word errors and say "well, it passed spell check." Some of these errors change the meaning of the passage or the entire article. In general they don't care. They are just pushing out content as quickly as possible to move on to the next story.

Yes, dictionaries should reflect the usage. But they should not absorb all of the erroneous uses that are common. This kind of thing creates an approval for careless vocabulary and grammar.

I have some awareness of the creation of the OED, which included searches for the uses of words in past literature. This is one book on the topic of which I am aware:

The Professor and the Madman (HarperCollins, 2003) by Simon Winchester

I have attended a presentation by the author of this book, read at least some of it, and seen the adaptation.

But since the initial publication, the OED has had a measured pace of adding new variants. Other dictionaries seem to rush a new edition every year and trumpet the up-to-the-minute uses they have included. The OED seems to take a "hall of fame" approach where it takes time before new definitions are added.

Communication is successful when the composer selects words and grammar that is appropriate to the topic and the audience. Different vocabulary is used for these different audiences. But the definitions of those words used must be clearly defined if they are not part of the established usage.

For this reason, I wonder about uses of "fingerprint" and "handwriting" and even "ghostwriter" in this thread. I think I know what they mean in this usage because they are terms I use in my own work. But sometimes the context makes me wonder if another variation is being applied.

James

949faktorovich
Apr 22, 2022, 1:31 pm

>941 prosfilaes: "Take Crayola's box of 64 crayons. You're saying that they're either red, blue or green at the base. I'm saying they all look slightly different, and grouping them into a certain number of colors is going to be arbitrary. Maybe there is some fundamental difference in the wax or additives that justify grouping them in less than 64 piles, but I'm going to need to see more than some random numbers." A carefully chosen corpus of books would not be a box of 64 differently colored crayons or crayons with different names written on each of the crayons. The first instance would be like choosing 64 different genres/ book types to compare to each other (as widely divergent as a children's book and an academic book). The second option would be the choice of 64 different bylines without any repeating bylines to check if the style is consistent within a byline or if a given "author" exhibits a similar style in more than a single text. And I am not saying the colors are "at the base", but that the entire crayon is a certain color, and has a certain name written on its side (researchers should be blind to the name written on the side, but not to the intuitively obvious color of the crayon as a whole). And if you had a box of 284 "slightly differently" colored crayons, since there are only so many basic colors, you would still have clusters of bluish, reddish, etc. colors. If the box is in a perfect distribution pattern where each is equally slightly different from the color next over in the rainbow; it would be impossible to reach an attribution conclusion; but if the colors would have spikes of similarity around certain points in the blue, red etc. ranges, then a researcher should see an obvious pattern of similarity. Or there might be a couple of obviously pure-blue texts, and a couple of pure-red, and some pure-purple texts, whereas most might be different from each other, or might form a single similar cluster. If you know before you start an experiment that "all look slightly different" in your box of crayons; then you have chosen this strange combination of slightly different texts, when you are attempting to check an attribution for specific colors/ bylines. The data on my GitHub is not at all random and shows clear clustering into six groups of different sizes and with more than a single genre represented in each group. You can see the dozens of matches between texts within a group and a pattern of near-total absence of matches with other groups for each of the 284 tested texts.

"All turned to those same group of writers I find beyond belief." Then, you still haven't read BRRAM, where I explain that a handful of publishers (including William Byrd) were granted legal monopolies by Elizabeth I (who used the Workshop as her own ghostwriters) over the press in Britain at the dawn of British publishing 1560s-70s, which held across the following century of the Workshop's control. So, nobody could print/ sell a book in Britain without asking for permission from these under-10 publishing monopolists that held sectors such as all textbooks, all poetry/music books etc. Additionally all independent contracting for the working class was illegal under Vagabond Laws that meant that serfs could not leave their Lords' lands physically or seek employment in writing or any profession without receiving an official permission from their feudal Lord. Feudalism was not yet over. There was a complete monopolization of publishing, so that nobody could freely/ publish/ write/ contract with nobility as a ghostwriter without potentially facing imprisonment for fraud (writing under a false name), vagabondism (laboring while in the lower-class for money), monopoly-infringement (publishing without a license or permission from a monopolist). It is really a very complex socio-economic explanation that I give across the BRRAM series that cannot fit here. But if you can ask a more specific question, perhaps I can explain it in a way you will understand.

"Maybe the Archbishop didn't write the works attributed to him, but they would have been written by someone who worked for him." The problem with this belief can be explained by studying the case of NY's governor, who was caught asking his staff to ghostwrite his book for him. When a public figure directly asks for ghostwriting help from their direct inferiors/ workers under them, this leaves them directly vulnerable to those workers outing their misbehavior, and using this to push them out of a job. In contrast, the indirect use of ghostwriters or hiring ghostwriters under NDAs or in manners that make the ghostwriter not want to disclose the help, mean the public figure is very unlikely to ever be outed (at least not unless they fail to pay their bill to the ghostwriter). I can explain this in more detail again, if you are truly struggling believing in how what I am proposing is the most rational explanation for the British Renaissance.

"What you're saying is that an attribution method can't be objectively tested, but it is correct if it matches your biases." I am absolutely not saying anything of the sort. I have explained that I have tested my method by using several other types of attribution, such as handwriting analysis, to make sure that these other methods led to the same attribution results; and they have. "August Derleth wrote in multiple genres. It's well documented that he did. It's well known that Lovecraft wrote certain works, and that Derleth wrote pastiches of Lovecraft's style. Separating an author from the pastiches of his works and identifying an author working in multiple styles are two tests that are important to be passed if this method is to be taken seriously, but you've rejected these tests out of hand." You are saying that separating "pastiches" from authentic texts is "important" for attribution analysis, but you are not explaining why. You have to define your terms in such a statement to explain what you are demanding. "Pastiches" are "imitations"; this term is different from a satire in that it is a sincerely imitation that mimics the style, and does not exaggerate the style to make a political etc. point. Imitative genres like science fiction are designed to make it easy for ghostwriters to work under multiple bylines or to exchange bylines without readers noticing a change of style because so many of the elements in the genre are so similar, the small linguistic patterns that distinguish styles become intuitively unnoticeable. But such imitation is not a modern invention, as the strict structure of Renaissance 5-Act plays was also highly repetitive and imitative. Thus, because my test has been able to distinguish Jonson's from Percy's "Shakespeare"-bylined imitative/formulaic plays; I have already proven that my method can distinguish between "pastiches" that appear on the surface intuitively to be similar and finding the underlying authentic linguistic divergence between authorial styles.

950faktorovich
Apr 22, 2022, 1:36 pm

>942 prosfilaes: Insects both are and are not animals, depending on the dictionary definition one has in mind when they make this statement.

How exactly do you test your subordinates to check if they are illiterate?

951faktorovich
Apr 22, 2022, 1:49 pm

>943 Keeline: The following paragraph from Volumes 1-2 of BRRAM came to mind:

In 1578, Harvey released his first attempt at Latin verse, "Smith and Tears Muses Cry for Death, Honors, and Men... Thomas Smith, Esq., Majestic royal secretaries". This title is similar to the title of an elegiac poem Harvey chose for “Spenser” in "Tears of the Muses" (1591). "Smith" is a collection of Latin elegies dedicated to Smith, written after this patron of Harvey’s died in August of 1577. Just as Spenser had given Harvey books as part of the payment for Harvey’s ghostwriting, Smith appears to have given Harvey “manuscript books” during his lifetime that Harvey was permitted to retain after Smith’s death on permission from his widow. This seemingly simple inheritance was strangely chastised during Smith’s funeral in the oration of Andrew Perne, who claimed these books for himself. Contrary to this somber occasion, Harvey replied with a witty public joke wherein he refused to surrender his hard-won plunder. Perne retained a grudge against Harvey that later blocked Harvey’s academic progress. Just as John Day’s expulsion from Cambridge for “stealing a book” in 1592 must refer to a plagiarism breach rather than a mere failure to return a library book, the exchange between Harvey and Perne over “manuscript books” was probably connected with Harvey’s ghostwriting. Two possibilities are that Harvey could have wanted to resell unpublished manuscripts he wrote under Smith’s sponsorship, or he might have begun annotating these books as part of his ghostwriting research.

952Keeline
Apr 22, 2022, 2:00 pm

>951 faktorovich:

Just as John Day’s expulsion from Cambridge for “stealing a book” in 1592 must refer to a plagiarism breach rather than a mere failure to return a library book, the exchange between Harvey and Perne over “manuscript books” was probably connected with Harvey’s ghostwriting. Two possibilities are that Harvey could have wanted to resell unpublished manuscripts he wrote under Smith’s sponsorship, or he might have begun annotating these books as part of his ghostwriting research.

I am reminded of the phrase "If you can't say what you mean, then you can't mean what you say."

A lot seems to hinge on this subtext interpretation of these situations.

James

953faktorovich
Apr 22, 2022, 2:38 pm

>945 anglemark: Here is a fragment of my evidence regarding ghostwriting for lawyers in Volumes 1-2:

During the years Harvey spent attempting to establish a law practice in London, he was especially desperate for an income, so he appears to have first contracted with several lawyers to ghostwrite texts that helped lift their social-standing. Based on linguistic tests of “Warner’s” Albion’s England, one of these contracts was with “William Warner’s” (1558?-1609) to first ghostwrite Pan His Syrinx (1584). Warner was an Attorney of the Common Pleas working in London, so Harvey would have sought out his advice on succeeding in this new field. Warner and other lawyers might have also hired Harvey as their secret-secretary for legal briefs or correspondences they had to write, but these were not tested in the study. Harvey’s connections among lawyers also included Thomas Watson Edwards, a lawyer at Lincoln’s Inn, for whom Harvey ghostwrote a handwritten unpublished epilogue called “Narcissus, L’Envoy” (French for, “Narcissus, The Messenger”) on a printed copy of Cephalus & Procris: Narcissus (1595).

As for paper-degrees, these are proven by instances when "authors" such as "John Fletcher" and "John Donne" are claimed to have started studying at Cambridge when they were only 12, when the normal age for admission was 16. Studies of university rolls from this century have uncovered blatant fraud in false ages being provided for students, and other frauds of varied sorts have been proven.

Old English is near-identical to a variant of Old German; Early Modern English was built heavily with German words; German(ic) monarchs had been in charge of the Roman Catholic Church for 800 years leading up to the Renaissance. The French/Norman conquest happened a few centuries earlier. Do you need more proof that some of the richest people in the UK were more likely to speak German/French/Latin than English? You don't know what an "English variant" is? I don't know how to start explaining it. There are variations of language across regions in the British Isles; some spots might have been mostly speaking Dutch or German or French or Scottish or Gaelic, or combinations of these languages and English; the various mixtures of English, Old English, and other languages, and regional variations is what I am referring to; it's not any single language but this Babel of multiplicity that would have made it difficult for a judge to be understood and to understand people of a different region than his own original region. The absence of any unique-handwriting with signatures documents for these judges proves they must have been illiterate, or they would have left evidence of their literacy. We are talking about all of the courts that left written records in a single scribe's (or two scribes') handwriting style.

You don't know what a "written opinion" in a case is? Can you specify what you are asking me to define?

I provide evidence for ghostwriting for judges/ archbishops/ kings/ queens etc. across BRRAM. Just read the series, or search through it for names you are interested in. Yet, you still haven't requested a review copy, and you are attempting to have me digest the information for you into bites that are taken out of context; but you really have to read the entire series to find all of the citations for all of the research, as my brief summations in this discussion cannot include pages of citations without replicating the entire series.

If a "scribe" was actually a ghostwriter; then, they were not really merely a scribbler or transcriber of notes. I mean they ghostwrote the texts they are credited with only transcribing. The "scribe"/ghostwriter would be the actual judge in complex cases who would not only record the testimony, but would also write up the opinion for the judge; alternatively, a "scribe"/ghostwriter might also use merely the finding of guilty/innocent to write an entry about the case without actually having been at the proceedings (knowing those on trial are illiterate and could not read the writeup). I tested several texts Harvey ghostwrote for lawyers, as I explained in a quote earlier today. I did not test all of the available law texts from this period because I have been focusing on texts that have previously been the subject of attribution studies (and Renaissance law texts haven't been analyzed). My bibliography does include several tested sermons and other theological books from the top Archbishops/ priests in England, and the corruption of the church with ghostwriting-purchasers, as well as the lawyers is sufficient proof of general corruption in this direction. "How did you get access to the court documents, in order to look at the handwriting?" The court documents from the Renaissance that I used for handwriting analysis have been digitized and are freely available online.

The grouping of the handwriting styles in the "Illustrations" file is the conclusion/ analysis. The "Order of Penance" matches Verstegan's handwriting in his self-attributed letter that is provided in this file as well; yes, both are in Verstegan's handwriting, and yes Verstegan is one of the six ghostwriters/ "scribes". So look through all of the handwriting styles in the Verstegan group, and you should see that they all match each other; and these also include a letter from "Sir Walter Raleigh" and "Richard Carew". If you focus just on the signatures on the "Order of Penance" and compare them to the many signatures on the "Letter... from the Privy Council" you should notice that many of the signatures are clearly written in a single handwriting style. I explain various handwriting patterns etc. in this Verstegan group in the "Introduction" to Verstegan's self-attributed "Restitution" - the volume that I am currently translating.

954faktorovich
Apr 22, 2022, 2:41 pm

>946 anglemark: We would all be able to more clearly understand what each of us means if dictionaries were able to capture meaning more accurately, thoroughly, consistently and rationally.

955faktorovich
Apr 22, 2022, 2:52 pm

>947 Keeline: What exactly were you trying to prove by citing this book? It seems it proves my argument more than your objections. Look at page 51, which cites Sir Edwin Durning's argument that "Shakespeare" was illiterate, as proven by the fact that the "same clerk" that wrote the "decision" also signed for "Shakespeare". If you look in my handwriting "Illustrations" file, you will see several examples of this wiggly "Shakespearean" handwriting in the Percy group - alongside with other signatures etc. in the same style. Though you are not quoting Durning, but rather the author of "In Search", who instead attempts to contradict the visually obvious evidence to argue that not only "Shakespeare", but also JFK had authentic handwritings (despite the latter's use of clerks etc., as documented). One problem is that in a chapter with "Bacon" in its title, Hamilton fails to include an illustration of Bacon's handwriting; these illustrations of "Bacon's" signatures are included in my file in the Northumberland MS; Northumberland was William Percy's home estate (I explain the significance across BRRAM).

956faktorovich
Apr 22, 2022, 3:01 pm

>952 Keeline: Subtext? Harvey's linguistic style matches several lawyer-assigned texts; this proves he ghostwrote for them. None of them were alive long enough to have been instead the actual ghostwriters that ghostwrote Harvey's texts. So, actually it all hinges on if you review all of BRRAM and all of the various types of evidence I provide, or if you keep taking a couple of sentences out of context and suggesting that these particular two sentences is all I say in the 17+ volumes so far of BRRAM.

957lilithcat
Apr 22, 2022, 3:38 pm

>948 Keeline:

The Professor and the Madman is actually the same book as The Surgeon of Crowthorne, an example of the not infrequent differences of title between the British and American editions.

958Keeline
Apr 22, 2022, 5:29 pm

>955 faktorovich:

I was specific in my reply and cited the text and to whom I was replying to. Look at every part of the reply again and see if it makes more sense. You are welcome to read and respond to it but it was not a reply directed towards you as indicated by the top of the message.

A question was raised about what style of handwriting was used. I provided a book by an expert on handwritten documents that shows a lot of examples of secretary hand with which I am familiar. Then anglemark could make a comparison to try to identify the style.

Just because a scholar addresses an argument in their book says it is a claim that is out there. It does not mean that the author agrees with it unless that is so stated.

There is a Shakespeare play that states "we kill all the lawyers." But just because an unsavory character makes that statement does not mean that it is the author's sentiment. Yet, you will see T-shirts and other objects that will take a quote like that and ascribe it to the author and not the character. It is careless or a deliberate misrepresentation, depending on who is doing it.

James

959Keeline
Apr 22, 2022, 5:37 pm

>953 faktorovich:

During the years Harvey spent attempting to establish a law practice in London, he was especially desperate for an income, so he appears to have first contracted with several lawyers to ghostwrite texts that helped lift their social-standing. Based on linguistic tests of “Warner’s” Albion’s England, one of these contracts was with “William Warner’s” (1558?-1609) to first ghostwrite Pan His Syrinx (1584). Warner was an Attorney of the Common Pleas working in London, so Harvey would have sought out his advice on succeeding in this new field. Warner and other lawyers might have also hired Harvey as their secret-secretary for legal briefs or correspondences they had to write, but these were not tested in the study. Harvey’s connections among lawyers also included Thomas Watson Edwards, a lawyer at Lincoln’s Inn, for whom Harvey ghostwrote a handwritten unpublished epilogue called “Narcissus, L’Envoy” (French for, “Narcissus, The Messenger”) on a printed copy of Cephalus & Procris: Narcissus (1595).

I see a narrative claim here but not citations from scholars to corroborate the claims. The titles mentioned are the texts for which the authorship claims are made but nothing (here) to show extrinsic evidence that this narrative is what occurred. At best it is what might have occurred if one is to believe that everyone in the period was functionally illiterate and only six magical people were capable of writing just about everything in a certain period of time.

You mention Verstegen a lot and claim he was a "secret-secretary" for Queen Elizabeth. However, does this seem very likely given his strong Catholic and Papal ties in an era when the Pope of that period of time effectively put out a death warrant on Queen Elizabeth? I've heard of the phrase "keep your friends close, and your enemies closer" but this seems rather risky and unlikely to me.

James

960Keeline
Apr 22, 2022, 6:02 pm

>956 faktorovich:

Yes, subtext. Rather than take the literal meaning of "stealing a book" or "manuscript books", you seem insistent that they can only mean plagiarism or ghostwriting. Since that is not explicitly stated. It is subtext, a particular reading of the words and phrases that are not directly stated.

I stipulate that under your aggregated measurements that you find similarity in the writing style. But since you reject the idea of calibrating the analysis tools with modern tests under the claim that it would be too expensive or everyone cheats, it is hard to give it the weight you desire for literature which has a couple centuries of scholarship and evidence. Any stylometric claim needs to be prepared to explain why it works.

It is one thing to say that this is consistent with 99% of authorship claims but here are a few interesting exceptions. But instead the theme here is that virtually nothing of what has been attributed before could possibly be correct and only this alternate list is the answer. This is why I have not embraced this as "really wrote" as if it is THE TRUTH.

You mentioned earlier that one could not understand Newton's formula for gravitation could not be understood without reading a whole book on the topic. That would come as a surprise to many who have taken physics classes at all levels from high school to upper division university courses where the formula is a part of one chapter to explain what it means.

The concept is fairly simple. The force of gravity is determined by the mass of the two objects and the square of the distance between them. G is a gravitational constant. More mass and/or a smaller distance between the centers will be reflected by a greater gravitational force.

Of course, this explains (to a first approximation) how gravity behaves and only in special circumstances to we realize the Einstein's Relativity is needed to account for those exceptional cases. For most of human experience, Newtonian physics works. Only when we try to go very fast (near the speed of light) do we need Einstein's Special Relativity calculations. It does not explain what makes gravity. Why is it an inherent feature of matter. Even thick books like Gravitation by Charles W. Misner and Kip Thorne cannot tell us exactly what causes gravity. But we can describe it, measure it, and predict the motion of (limited numbers) of objects under it very well.

James

961raidergirl3
Apr 22, 2022, 7:10 pm

>960 Keeline: that’s the formula we learned in my HS physics class today. It’s my favourite formula in physics, lol.

962faktorovich
Apr 22, 2022, 8:38 pm

>958 Keeline: You are yet again ignoring all of the essential points I have raised, and instead are imagining wrongs I have committed that I never committed, and adding points that are entirely irrelevant or indicate you did not read my argument before babbling onwards.

963faktorovich
Apr 22, 2022, 8:55 pm

>959 Keeline: All of the texts I mention in this paragraph I tested and thus they are all cited in the bibliography of the 284 tested texts that was included in this Volume 1-2 of BRRAM, and is available on my GitHub page. The surrounding chapter cites plenty of other scholars, as well as other primary sources that support the various claims raised in re-attributing this cluster of texts to Harvey. I am not going to quote the entire chapter here. You are going to have to request a review copy, and read the full chapter to avoid being confused about where the rest of the evidence is. The paragraph does explain Harvey's pattern of ghostwriting for lawyers and evidence of this fact outside of linguistics. If you have to accuse a researcher like me of being a witch or of concocting "magical people"; your argument has no real weight, and so you are grasping at fictions you are making up to shut down my rational and precisely explained argument. Verstegan's "Restitution" was initially listed as printed in Antwerp in 1605, but it was also entered into the London Stationers’ Register and was sold in London by John Bill and John Norton (who apprenticed with his uncle William Norton, and then registered as a printer in 1590, and later thrice became a Master of the Stationers’ Company). Then, the 1628 and 1634 editions of "Restitution" were printed directly by the British King’s Printer (after James I died in 1625 and Charles I took the throne): “Printed by John Bill, Printer to the King’s most Excellent Majesty”. Across these decades, Verstegan was operating a major Catholic exile publishing company in Antwerp and was on a pension from Spain/ the Pope and other Catholic patrons, as well as ghostwriting under various names for James I, and various other variedly powerful British contractors. The publication of one of Verstegan's rare English self-attributed books by the British King's Printer proves that he was working for and with both sides of these political and theological conflicts.

964prosfilaes
Apr 22, 2022, 9:05 pm

>950 faktorovich: Insects both are and are not animals, depending on the dictionary definition one has in mind when they make this statement.

No. Depending on the definition. Dictionaries are guides to the language, not an inherent part of it. And your inability to handle this part of modern English really brings to question your skill in interpreting Early Modern English.

How exactly do you test your subordinates to check if they are illiterate?

Ask them to read something to you and write down something for you? If you're illiterate, and you suspect they're going to try and fake it, one person writes it down and another person reads it.

>953 faktorovich: Old English is near-identical to a variant of Old German; Early Modern English was built heavily with German words; German(ic) monarchs had been in charge of the Roman Catholic Church for 800 years leading up to the Renaissance. The French/Norman conquest happened a few centuries earlier. Do you need more proof that some of the richest people in the UK were more likely to speak German/French/Latin than English? You don't know what an "English variant" is? I don't know how to start explaining it. There are variations of language across regions in the British Isles; some spots might have been mostly speaking Dutch or German or French or Scottish or Gaelic, or combinations of these languages and English; the various mixtures of English, Old English, and other languages, and regional variations is what I am referring to;

Old English was dead and gone by 1200. I don't know of any strong German addition to Early Modern English. The Roman Catholic Church spoke Latin; the speech of Rome took over that of its conquerors. Frankish and Gothic were long dead languages by this point in history. No communities spoke Latin natively at this point, and even if someone was raised with Latin, they would have learned English from their community. They also would have been literate; nobody at this point would have taken the quixotic approach of teaching Latin to a child without teaching them to read and write as well.

In England, they spoke English at this point, with some small areas of Cornish speaking. Only 22,000 of the 84,000 people in Cornwall still spoke Cornish by the dawn of the 17th century; it was well on its way to extinction. The English of Northern England might have been hard to understand for a London speaker, but it was never a prestige dialect, and it's unlikely that a judge or anyone with a position of power wouldn't have understood London English.

That is, in England, the people in power spoken English. The days of Old English were long gone, the days of Norman French were past, and Latin was an academic language, not a native tongue. The prestige dialect, the language people in power spoken, was that of London; other dialects might be hard to understand, but judges wouldn't have exclusively spoke them.

It's like arguing that a judge in New York City may not speak English, instead speaking Munsee (since that was the earliest known language in the area) or Dutch (since it was New Amsterdam).

You've got this so wrong that it's amazing. Sections like this are when you so clearly prove that you don't know what you're talking about. If you can speak so confidently on something I can see you are so incorrect on, why would I reading 16 volumes of material packed with claims I can't trust?

965faktorovich
Apr 22, 2022, 9:15 pm

>960 Keeline: I do not state that "stealing a book" can only mean plagiarism; but rather I have pointed out that past scholars have insisted that it means the physical theft of a book, without considering that it can be alternatively referring to "plagiarism". It does indeed refer to plagiarism, once all of the surrounding evidence is taken into account.

You are again imagining what I am saying without reading what I am actually saying. You proposed introducing an entirely irrelevant test for if students would show similarity in writing style in a live-subjects test. You are failing to acknowledge that no previous computational-linguistic in attribution studies had previously applied any such test to live subjects. Instead you are stating I have not been "calibrating the analysis tools with modern tests". "Calibrating" means to "adjust"; so, you want me to adjust "tools with modern tests"? This is just nonsensical. My tests are already modern and fully adjusted for the needs of the experiment. Then you add: "Any stylometric claim needs to be prepared". Yes, I have "prepared" extensive "stylometric claims" alongside with evidence to prove the conclusions of my "stylometric" analysis.

Then you claim that I claim: "virtually nothing of what has been attributed before could possibly be correct". This is inaccurate since in the "Lunch Test" we carried out in this discussion, several of my attributions matched the current attributions and the attributions the other computational-linguist came up with using his own method. So my method does come with many commonalities with existing attribution claims, and certainly not "virtually nothing". An exception might be if you are looking only at the British Renaissance, and are considering the texts self-attributed to the six ghostwriters to be too small in number versus the full corpus of 284 texts - but this is just the actual truth regarding authorship in this period.

While a student might have a sense they know everything if they just understand what the elements in the gravitational equation mean; a professor of this subject would know that food for scientific research into such a formula lies in the hazy matter of what the formula fails to communicate or raises doubts about, and that is introduced in the rest of the book on this subject. Thus, if you stop at the simple points I am raising about my method in this discussion without reading the book; you are taking the easy road of the student who needs to calculate simple math problems on a test, instead of having the curiosity of a researcher to understand what beyond these numbers might need further research and further study. You are focusing on proving me wrong just to avoid reading the rest of the book beyond this formula; so you are like a student who interrupts the class with a crass joke because you are struggling to understand a topic, and would prefer to insult the teacher to prove nobody really needs to understand it because the teacher can be nullified with an insult.

966prosfilaes
Apr 22, 2022, 9:18 pm

>963 faktorovich: you are grasping at fictions you are making up to shut down my rational and precisely explained argument.

Describing your argument as "rational and precisely explained" is almost counterproductive; "rational and precisely explained" arguments usually have someone besides their creator to so describe them.

The publication of one of Verstegan's rare English self-attributed books by the British King's Printer proves that he was working for and with both sides of these political and theological conflicts.

You can assert that as much as you want, but no, that doesn't come anywhere near proof. I do not buy that any author was spending a lot of time writing on both sides of a theological conflict at the same time, and it would take far more solid evidence than you've presented to convince me of it.

967faktorovich
Apr 22, 2022, 9:33 pm

>964 prosfilaes: If you think there is only one meaning/ spelling for any given word, it is instead you who should not attempt to translate Early Modern English. Take a look at the 17 potential meanings/ spellings that come up when you just look up the word "but" in the Middle English Compendium: https://quod.lib.umich.edu/m/middle-english-dictionary/dictionary?utf8=%E2%9C%93...

You have to know if an author who is using variants as different as "bute(n, butte, bud; boute(n, bout; beote(n, beute; bote(n, botte, bot, bod, pot(te" is using them in the sense meant for one specific variant out of the 17+ possible options. So if you enter such a translation with the assumption that "insect" must be an "animal"; and you use this interpretation in the text you are translating, you would probably miss the subtext or the specific meaning that the author had in mind, as he might have specifically intended to say that an "insect" is not a "mammal"/"animal".

What you are saying, "Dictionaries are guides to the language, not an inherent part of it", is completely nonsensically philosophical and irrelevant for the reality of translating complex texts with a myriad of possible meanings.

"No communities spoke Latin natively at this point... nobody at this point would have taken the quixotic approach of teaching Latin to a child". Can you cite whatever you are trying to say here? Most of the books that were published between the first printed book and at least the middle of the Renaissance were in Latin, including those printed in Germany, France, England, etc. If nobody had been taught to read and write in Latin; this language would not have been used in so many books.

Cornish, Welsh, Scottish... just keep adding the variants and you should eventually convince yourself that I'm right. "Norman French were past"? The Norman-French remained as aristocrats in England, and they kept using French as a standard in the English court; this is a pretty generally known fact.

It is faulty to equate language variants in the Modern world with those before the standardization of spelling/ usage/ pronunciation. Most households just owned a single book (the Bible). There was no TV, or globalized forms of entertainment that would have allowed folks to hear the "proper" pronunciation for words, and the pronunciation rules in textbooks were too vague, and the English language was secondary to learning Latin classics in schools.

As always you end by congratulating yourself on being perfectly correct, and insisting I am completely wrong, without having actually provided any rational evidence to support this grandiosely self-important conclusion.

968faktorovich
Apr 22, 2022, 9:36 pm

>966 prosfilaes: Yes, and that bundle of "far more solid evidence" is in the BRRAM series, which you are refusing to read because you know what the absolute truth is, and you are sure without reading the evidence that your ideas about this absolute truth are absolutely true.

969prosfilaes
Apr 22, 2022, 9:36 pm

>965 faktorovich: you are like a student who interrupts the class with a crass joke because you are struggling to understand a topic, and would prefer to insult the teacher to prove nobody really needs to understand it because the teacher can be nullified with an insult.

Except that we aren't students and you aren't a teacher here. You expect that we buy your knowledge here and ask you to explain things we don't understand. We are equals in this forum, and we're asking you to defend your claims, which you have completely failed to do. You're the street preacher whose bloviating is merely worthy of mockery, and any countering is more for the other audience or for the heck of it, since the preacher doesn't care about facts as you know them.

"Calibrating" means to "adjust";

I.e. you don't know what calibrating means or want to obfuscate the issue. You take a tool and calibrate it because when you want precise results and things can get a little out of skew over time. You calibrate even the most precise and expensive tools. You especially calibrate a tool when it's new; when a reputable company sells a thousand dollar scale or laser measure, someone has carefully taken the product and adjusted screws until everything lined up and it produces the best results, and then when it's been shipped, the buyer does that on the other side. You're telling me there's no point in carefully measuring whether each feature reliably predicts authorship, or whether it varies wildly between different works of the same author? That doesn't say anything good about your system.

970prosfilaes
Apr 22, 2022, 10:03 pm

>967 faktorovich: If you think there is only one meaning/ spelling for any given word,

Nice strawman. The point was that you refused to read "animal" in the appropriate meaning in the current context, and made a big fuss about it.

What you are saying, "Dictionaries are guides to the language, not an inherent part of it", is completely nonsensically philosophical and irrelevant for the reality of translating complex texts with a myriad of possible meanings.

The fact that dictionaries don't have all possible meanings, and that many of their meanings may be limited to a certain group or time period, even if not noted, is irrelevant to the reality of translating complex texts?

"No communities spoke Latin natively at this point... nobody at this point would have taken the quixotic approach of teaching Latin to a child".

No communities spoke Latin natively at this point, and even if someone was raised with Latin, they would have learned English from their community. They also would have been literate; nobody at this point would have taken the quixotic approach of teaching Latin to a child without teaching them to read and write as well.

I'd say that's one of the most obvious misquotes by ellipsis I've ever seen. "Nobody would have done x without y" means something clearly different from "Nobody would have done x". Yes, people learned Latin, as a second language, among educated people who would have learned to write along with learning Latin.

Cornish, Welsh, Scottish... just keep adding the variants and you should eventually convince yourself that I'm right.

The EU has 24 official languages: Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hungarian, Irish, Italian, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovenian, Spanish and Swedish. Guess what; all the people in the EU headquarters speak English, because that's the established lingua franca, and the nations wouldn't send an ambassador to the EU who couldn't speak English. Sure, there are quite a few languages spoken in the British Isles, but if you wanted to be a judge in England you spoke London English. You weren't a peasant who spoke only Welsh or Cornish or Scots or Gaelic.

The Norman-French remained as aristocrats in England, and they kept using French as a standard in the English court; this is a pretty generally known fact.

Nope, my sources say that French was mostly out of the English court by the 16th and 17th centuries. And it's a pretty generally known fact that the author of William Shakespeare's plays is William Shakespeare; it's weird to see you citing known facts as if you don't ignore them at will.

971bnielsen
Modificato: Apr 26, 2022, 1:47 pm

>903 Crypto-Willobie: Be prepared for >1000 lilithcat: coming up soon :-)

972faktorovich
Apr 23, 2022, 12:26 pm

>969 prosfilaes: A "teacher" is "a person who teaches" or gives "information about or instruction in (a subject or skill)". So when you say that I am selling "knowledge" (whereas I am actually donating this knowledge for free), you are saying that I am teaching you. So you must not understand what the terms "teacher" and "student" mean, and that's why I am providing the definitions for these various terms (including "calibrating") to explain why what you are saying is nonsensical or untrue.

Yet again you are imagining things I am saying that I am nowhere near saying when you claim: "You're telling me there's no point in carefully measuring whether each feature reliably predicts authorship". I have actually stated repeatedly the exact opposite. As I have explained that I rested and re-tested my 27-tests method on not only 284 different texts in the Renaissance, but also dozens of texts in several other centuries. I have finetuned my method, and explained how I polished it to be presentable and precise in BRRAM.

973faktorovich
Apr 23, 2022, 12:53 pm

>970 prosfilaes: "You refused to read 'animal' in the appropriate meaning in the current context". You are still not understanding my explanation regarding multiple variant spellings/ denotated meanings. Each speaker/writer in a language has an ability to choose the denotation or the definition for a term they want to use out of any dictionary that they prefer that fits their specific context and intended meaning. It is up to the reader/ interpreter to grasp what the speaker's intended meaning is. As long as the speaker is not inventing their own definition or their own rules of grammar, or breaking away from all existing definitions/ spellings of a given word; the speaker is not in the wrong, as he or she has expressed whatever it was he or she was trying to state. The reader is in the wrong if he or she attempts to use the wrong definition and thus fails to understand the speaker and rephrases the statement as the opposite or an unintended version of itself.

Consider that James I did not speak any English when he took the English crown in the union of the Scottish and English crowns in 1603. When James migrated to London at this point, he brought with him cronies or "thirty-pound gentlemen/knights" who paid him money to receive titles within the monarchy (including baronies, knighthoods, and certainly also judgeships). Jonson was imprisoned over explaining this practice in "Eastward Ho!" (1605) Here's some info: https://mapoflondon.uvic.ca/THIR1.htm Percy and Jonson explained these corrupt practices in more detail in some of their last or posthumous plays, including Percy's "Captain Underwit" (about a military "captain" who purchased this title) and Jonson's "Variety" (about a wealthy landowner who purchases a Ladyship title for his mother, and a gentleman title for himself etc. with his money, a process that involves corruptly bribing a Judge to perform other illegal acts for money). "Captain Underwit" was published in the first half of BRRAM, and "Variety" is finished and forthcoming after I finish the second half of BRRAM. Both of these include various types of evidence in their introductions that explain and support why these plays are not referring to hypothetical, but rather to specific, witnessed corrupt crimes. Returning to the language problem, James I brought his Scottish friends who like him also did not speak English, so when they became judges, they would have had extreme difficulty following London speakers in a court proceeding.

Reports/ court proceedings continued to be written in Law French across the Renaissance: https://www.google.com/books/edition/Law_and_Empire_in_English_Renaissance_Li/ui...

974FAMeulstee
Apr 23, 2022, 1:08 pm

>973 faktorovich: Still an insect is an animal, it is not up to language specialists to define. That is part of classification in biology.

975faktorovich
Apr 23, 2022, 8:52 pm

>974 FAMeulstee: If there was only one answer as to if an insect is an animal or not; humanity would have probably also solved many other problems that have been plaguing it for millennia. As things stand, the definition of "is" was in doubt in the Lewinski scandal; and currently the definition of "electoral democracy" and "individual freedom" are just a few of the terms that are "up to language specialists to define"; but the specialists disagree; and everybody wants to invent their own definitions.

976prosfilaes
Apr 24, 2022, 1:17 am

>973 faktorovich: Each speaker/writer in a language has an ability to choose the denotation or the definition for a term they want to use out of any dictionary that they prefer that fits their specific context and intended meaning. It is up to the reader/ interpreter to grasp what the speaker's intended meaning is.

Then why do you keep telling us what the words we used meant?

Consider that James I did not speak any English when he took the English crown in the union of the Scottish and English crowns in 1603.

Citation needed. Again, you depend on established history a lot for someone who is tossing it all aside. His Daemonologie looks pretty English to me; I'm sure you're going to say he didn't write it, at which point I have to wonder how you would know one way or the other.

he brought with him cronies or "thirty-pound gentlemen/knights" who paid him money to receive titles within the monarchy (including baronies, knighthoods, and certainly also judgeships).

{sarcasm}Certainly also judgeships.{/sarcasm} Again, it reminds me of a conspiracy theory; a pile of "solid" evidence that would only be believed by someone who already bought the theory. You didn't bother to respond to my point, that even if we accept that as true, the most corrupt systems always still have naive, idealistic people, and systems that promote incompetents to sinecures with nominal responsibilities promote people who let it go to their head and think they're more competent then they should be.

these plays are not referring to hypothetical, but rather to specific, witnessed corrupt crimes.

Exceptions that prove the rule. This wouldn't be a big deal or even consider corrupt if it were the standard thing.

Reports/ court proceedings continued to be written in Law French across the Renaissance:

Law French. Which means that (a) you didn't study any of them and (b) it was a specialized dialect that nobody could have got by just knowing.

977anglemark
Apr 24, 2022, 11:04 am

>976 prosfilaes: Law French. Which means that (a) you didn't study any of them and (b) it was a specialized dialect that nobody could have got by just knowing.
Exactly. It was a jargon used only in written texts. Law students would be taught it, but it was not a language of conversation – by 1500 it was already pretty much a frozen register of isolated phrases with a lot of Englishisms in the grammar. The reason for that being that most of the lawyers and judges who used Law French did not actually speak French, and that English had to be used at the trials.

James I was something of a polyglot, he had been taught French, Greek, and Latin by his tutors. When he became king of England he apparently spoke English with a Scots accent, and was mocked for that by his detractors. That in itself is a pretty good indication that he did speak English – if we'd had any reason to doubt it!

-Linnéa

978faktorovich
Apr 24, 2022, 1:37 pm

>976 prosfilaes: Yes, as my data shows, Verstegan ghostwrote "Demonology" alongside with books against this type of anti-witchcraft superstition. Here is a fragment I wrote about King James language in BRRAM's Volumes 1-2:

James VI became King of Scotland when he was one, having been born in Edinburgh. He became James I and VI when he also gained England and Ireland’s crowns upon Elizabeth’s death in 1603; his crowning signified the joining of the Scottish and English kingdoms. The publications that were attributed to James prior to his ascension must have helped him secure this unprecedented union of the crowns. James’ native language was Middle Scots, which would have made the acquisition of English particularly difficult when he migrated to England at thirty-seven. The first anonymous text that has been re-attributed to him, A Short Treatise containing some rules… in Scottish Poetry (1584), was not reprinted in an English translation after its release in Edinburgh in Middle Scots. The attribution for the subsequent His Majesty’s Poetical Exercises (Harvey/Sylvester: 1591) is supported by its title’s possessive phrasing, as well as in the line “Robert Walde-grave, printer to the King’s Majesty”. This book opens with a few poems dedicated “To the King of Scotland” by “Henrie Constable”, M. W. Fovler and others, before a long poem commences without a byline at its start or end.

And sources that have discussed James' language include:

J. Q. Adams, “The Author-Plot of an Early Seventeenth-Century Play”, Library, 4th set, 25 (1945), 17-27. C. H. Herford, P. and E. Simpson, eds., Ben Jonson, 11 vols. (Oxford: Oxford University Press, 1925-52), i, 15-6.

We are discussing a system where the tyrannical ruler (King James) only grants titles when they are purchased with bribes (including sexual favors, as he was known to give enormous grants/ estates to his male lovers), and has no motive to give any jobs, titles etc. to anybody who fails to bribe him. We might look from a distance at a society that is entirely built on bribes and corruption and assume that such a state would collapse because nobody would be doing anything. But back then, they expanded the number of people who were enslaved (while fazing out feudalism); so then things were being cleaned and built through free forced labor. And in modern society, the poor are being paid so little that many of them are homeless, or might have less real income than somebody who was enslaved and had a place to live. I saw a documentary the other day about an 1850 steam boat that was dug up from the mud, and the tools that were uncovered had identical designs to the tools that can be purchased in a supermarket today; this is not a weird coincidence, but the result of an absence in scientific/ industrial development in the field of tool design, and most other fields. If all professors of engineering/ chemistry/ mathematics etc. only receive their jobs and grants through bribes, and there is only the illusion of science being done via ghostwritten self-plagiarizing nonsense-papers; then, we are all scientifically stuck, and do not benefit from the discoveries actual intelligent people would be making if merit alone won grants and positions of scientific/research power.

9792wonderY
Apr 24, 2022, 2:14 pm

Laying money on a vigorous defense of “fazing out.”

980clamairy
Apr 24, 2022, 4:00 pm

>979 2wonderY: :D But who in their right mind would bet against it?

981lilithcat
Apr 24, 2022, 7:44 pm

>979 2wonderY:, >980 clamairy:

It's called the "Humpty Dumpty defense". See >416 lilithcat:

982faktorovich
Modificato: Apr 24, 2022, 9:50 pm

>981 lilithcat: Sure, a defense is possible (though it is more likely I made a typo for "phased"). For example, one of the earliest versions that was first-printed in 1810 for "Humpty Dumpty" spelled "sat" as "sate": https://www.gutenberg.org/files/34601/34601-h/34601-h.htm Given the inconstant spelling within the "Humpty Dumpty" lyrics, they should not be used to police variant spellings for "phasing"/"fazing". And to "faze" is to "disturb or disconcert"; thus, "while disturbing/unsettling out feudalism" is still a generally coherent statement with a similar meaning. I have noticed a few spelling errors in your replies, alongside with the broadly false and nonsensical statements you have been making. I have chosen to focus on the latter bigger problems because typos are easily spotted by all, while the misattribution of a century of published books has gone entirely unnoticed by all until my BRRAM.

For example, prosfilaes, in 976: "even if we accept that as true, (Error 1: a semi-colon is needed in this "if"/"then" statement because the clause before and after this interruption have a subject and a verb) the most corrupt systems always still (Error 2: always and still are redundant: either one would deliver the message; though one refers to the future, and the other only up to the present) have naive, idealistic people, (Error 3: the sentence becomes confusing here because you are really starting a new thought after this interruption, so it should be a semi-colon or period; if you had changed it a terminal point, you would have realized you forgot to reach a conclusion) and systems that promote incompetents to sinecures with nominal responsibilities promote people who let it go to their head (Error 4: you should have used the plural "heads", since you used the plural "people") and think they're more competent then (Error 5: typo in using "then" instead of "than") they should be." (Error 6: you have started this thought with the goal of disproving my point regarding the overgrowth of incompetents in top positions in corrupt systems; but you have forgot about this goal, and by the end of this statement, you just restate my own point, without mentioning that you have decided to agree with me.)

I do make occasional typos in this discussion because I never re-read, edit or re-write any of the entries I have been making here. In contrast, I re-read and re-write extremely closely all of the books I write/publish at least a couple of times, including all of the volumes in the BRRAM series. Finding a typo or a lack of full citations in this discussion and claiming it says something about the quality of my published research, is like equating with the quality of his books a mid-night tweet of Neil deGrasse Tyson: "Not that anybody inquired, but to say the individual letters 'N-C-A-A' (as in @NCAA) uses fewer syllables than to say 'N-C-double-A'. So to utter 'double-A' does not save you time."

983susanbooks
Modificato: Apr 25, 2022, 12:23 pm

>982 faktorovich: "I do make occasional typos in this discussion"

Perhaps you'd like to revisit the "peaked my interest" discussion in light of this admission.

"I never re-read, edit or re-write any of the entries I have been making here."

That's funny. I re-read, re-edit, & re-write nearly all of mine, however short, because I care about clarity, precision, and audience reception. I'm betting a lot of other posters in this thread do the same. Obvious conclusions to be drawn.

984Petroglyph
Modificato: Apr 25, 2022, 2:04 pm

>982 faktorovich: Error 1: a semi-colon is needed in this "if"/"then" statement because the clause before and after this interruption have a subject and a verb

No. A semicolon is usually placed between two main clauses. Subclauses and the main clause they're dependent on are usually separated by a comma. Of course an if-clause has a subject and a verb -- that's what makes it a clause, definitionally so. Can you not correctly identify subclauses and main clauses?

From your own post >25 faktorovich:

It is irrelevant if my versions of "Shakespeare" sound like me
If William was imprisoned in this Castle, Henry’s bill would have included charges for the room
If all of their Renaissance shelves/books are misattributed, this is a cataloging problem that librarians have to be aware of.

>982 faktorovich: Error 2: always and still are redundant: either one would deliver the message; though one refers to the future, and the other only up to the present

No. They both mean slightly different things, and therefore both are appropriate. You contradict yourself here in your eagerness to find fault.

>982 faktorovich: Error 4: you should have used the plural "heads", since you used the plural "people"

No. A person only has one head. If prosfilaes had said "people let it go to their heads", they'd be implying that people have more than one.

>982 faktorovich: Error 5: typo in using "then" instead of "than"

One out of five.

>982 faktorovich: Error 6: you have started this thought with the goal of disproving my point regarding the overgrowth of incompetents in top positions in corrupt systems; but you have forgot about this goal, and by the end of this statement, you just restate my own point, without mentioning that you have decided to agree with me.

prosfilaes claims that corrupt systems also contain people who are too naive/idealistic to be corrupt; and people who are too incompetent to be properly corrupt. You're so hasty to crow victory you're stepping over people's actual point.

This is such a Faktorovich mix of comments: careless reading for the purposes of "brisk impressions", mixed with a clear preference for rule-based black-or-white judgments. Her personal branding is on point.

985Petroglyph
Apr 25, 2022, 2:10 pm

While I'm at it:

>972 faktorovich: A "teacher" is "a person who teaches" or gives "information about or instruction in (a subject or skill)". So when you say that I am selling "knowledge" (whereas I am actually donating this knowledge for free), you are saying that I am teaching you. So you must not understand what the terms "teacher" and "student" mean, and that's why I am providing the definitions for these various terms

Faktorovich, your insistence on substituting general dictionary descriptions for people's words have caused you to overlook a very important social cue here: a prototypical teacher-student relationship is one that features a power imbalance as one of its central features. It isn't merely a transfer of information between two variables of the type $Person, there's dimensions of seniority, authority, experience, expertise, knowledge (which is different from information), perhaps even intelligence and superiority (or a relevant subset of these, as the situation may warrant). The power imbalance derives from the teacher-figure usually being higher on those dimensions than the student, with the usual expectations that the student show the teacher respect and deference.

When >969 prosfilaes: said that "Except that we aren't students and you aren't a teacher here {...} We are equals in this forum", they were objecting to you adopting a position above the other participants here, of claiming the advantage of that power imbalance for yourself. They were telling you that that was inappropriate of you.

Denotations aren't the only relevant aspect of lexical semantics; connotations, social semantics and semantic prosody (to name but a few) are hugely important in human communication. Though I accept that those nuances can be hard to grasp in a written-only format.

986paradoxosalpha
Modificato: Apr 25, 2022, 3:22 pm

>985 Petroglyph:

But a power imbalance can always be assumed on the basis of sufficient quantities of pedantry! /sarcasm

987Petroglyph
Modificato: Apr 25, 2022, 3:39 pm

>986 paradoxosalpha:
Or it might be acquired over time, as repeated comparisons with misunderstood science geniuses accumulate. /sarcasm

988Petroglyph
Modificato: Apr 25, 2022, 4:42 pm

>943 Keeline:
The Collins CoBuild dictionary is a great learning tool for those acquiring English as an L2. Its definitions are aimed at learners and attempt to illustrate actual usage patterns.

Every entry in the dictionary is derived from corpus research (some of the people involved in developing this dictionary in the eighties and nineties were pioneers of corpus linguistic techniques), which helps in determining which words to include (for various thresholds of "most common words" -- see the red bubbles to the right of the headword), and the breakdown of senses is based on keywords in context (among other things).

Dictionaries usually illustrate the various uses of a word with example sentences, but Collins pays extra special attention to those: they'll give you actual, real-life sentences from the corpus (so no made-up examples) that highlight typical uses and common collocations. Look at the entry for liable. That pattern is also illustrated in the main explanations:

If people or things are liable to something unpleasant, they are likely to experience or do it

This tells you that a typical use of this adjective is as part of the phrase BE liable to, followed by something negative. Very useful for learners!

Here are some more examples that show learners how to use adjectives. The lemma for peremptory specifies: "usually ADJECTIVE noun", telling you it's typically a prenominal adjective, as opposed to a predicative adjective; a peremptory gesture rather than a gesture that is peremptory. Look at the lemma for true: you say that things are true (i.e. predicative) when you want to mean "fact-based", but when you say that people or things are true NOUN, you are typically expressing approval rather than factuality.

The entry for the verb commit is another great illustration. Look at how the dictionary separates the usages by their transitivity pattern: transitive with the direct object being a crime; ditransitive with some resource as the direct object and the indirect object the goal to which that resource is put; etc.

All of these things -- how common a word is, collocations, which phrases a word occurs in, whether it's followed by a positive or a negative word, whether adjectives are prenominal or predicative (and whether that changes the semantics!), what transitivity patterns a particular verb exhibits -- are extremely useful to ESL people. The CoBuild suite of learning materials are a wonderful resource that brings these subtle patterns to the foreground. (They produce corpus-based grammar books and usage guides, too.)

(Disclaimer: I was assigned this dictionary as an undergrad, and have also assigned it to my ESL students, but I've never worked for them.)

989faktorovich
Apr 25, 2022, 9:12 pm

>984 Petroglyph: You are proving my larger point. Grammar and spelling are subjects that are open to eternal debate. For every grammatical rule, there is a contradictory rule somewhere in the books. I do not edit what I say on social media because it is supposed to be a "social" platform where people speak genuinely as they would if they were having a casual chat in person. And unlike in a scholarly periodical, readers can ask on social media for clarification regarding a term/ grammar, and the resulting clarification can immediately fix any potential lack of clarity in the original problem. And most obvious typos do not need to be explained, as they are common mistakes that most readers immediately interpret as-intended despite the erroneous spelling. If I had several typos in each message, it would be a problem; but if I write thousands of words and one of them is a typo; it is far more problematic if readers focus on taking about this minor typo, while ignoring all the rest of the evidence presented in the surrounding passage without typos. Focusing on the typo to avoid reading or comprehending the rest of the argument is an easy escape that detracts from this reader's growth. Before the modern strict standards for spelling, in the Renaissance, there were plenty of varied spellings, and you guys have been saying that modern readers should read these texts in their original spellings, instead of having these modernized. Thus, if I insert typos into all of my messages deliberately (using similarly sounding words/ variants), this should only add to the fun of guessing what my true meaning is (as you are saying it is fun to do with Renaissance texts).

Returning to the correction of specific errors. 1. A subordinate clause is a fragment if it is left without the main clause it is dependent on; in this case, "even if we accept that as true", is a complete thought, or at least as complete as it gets, as the following clause digresses away from this idea into a different idea, "the most corrupt systems always still..."

The rest of your rebuttals are just silly. What I originally stated remain the exact corrections to these errors that were needed to fix this passage. For example, yes, a "person has one head", but "people have many heads".

990faktorovich
Apr 25, 2022, 9:32 pm

>985 Petroglyph: "Since he was valiant, I honor him. But since he was ambitious, I slew him." --"Shakespeare"/ Jonson, "Julius Caesar" (1599)

The problem is not that I am misunderstanding your meaning or social cues, but that you are deliberately spinning everything I say to find something negative in it. After reading my dictionary-based understanding of the terms "teacher" and "student", you might have agreed with the accuracy of these basic terms. Yes, a teacher is anybody who gives knowledge, and a student is anybody who receives knowledge. Many proverbs that explain that the best of us are life-long students come to mind; for example: "He who flatters me is my enemy, who blames me is my teacher." --Chinese Proverb. If you are responding to the suggestion that you might be asked to become a student again and to learn something new by rebelling that you are too superior to be thus subordinated to anybody... The problem is in the irrationality of your reaction, and not in the Zen vibe of my original statement.

991faktorovich
Apr 25, 2022, 9:51 pm

>988 Petroglyph: This entire post is an example of one of the biggest problems that can occur in written language: digressive tirade that leaps between subjects without tying them together, and instead including an introductory and concluding sentences that make insulting or attacking comments. When any portion of these digressive sub-points is examined closely individually it is proven to be nonsensical. For example: "Look at the lemma for true: you say that things are true (i.e. predicative) when you want to mean "fact-based", but when you say that people or things are true NOUN, you are typically expressing approval rather than factuality." The concluding remark is that when one states "people or things are true"; "true" in this context can be substituted with "approval", but not with "factuality"; this is absolutely false; as the phrase, "X thing is true" cannot be substituted with "X thing is approved", but rather precisely with the opposite of what you are saying, or with, "X thing is factual". If each of your points is taken out for being similarly nonsensical; you are just left with a couple of nativist insults that claim that no author who was not born speaking English can write in English accurately or at native-level. I started taking English classes in third grade, so if you started learning to read English in first grade; we are not far apart in this regard. It is much easier for somebody who speaks multiple languages to understand grammatical/ spelling intricacies than for somebody who only speaks any one language because all languages are interconnected. You appear to have found yourself in a loop where you are yelling: "Foreigners can't speak English good! No foreigner is as good at English as I, since I am a native-Englisher!... etc." Maybe you should be sensitive to social cues that should instruct you that such xenophobic or ethnic discrimination is morally as well as socially wrong.

992prosfilaes
Apr 25, 2022, 10:33 pm

>990 faktorovich: Many proverbs that explain that the best of us are life-long students come to mind;

But you aren't listening to us. You're telling us it's better to be students, but you want the authority of being a teacher.

>991 faktorovich: you are just left with a couple of nativist insults that claim that no author who was not born speaking English can write in English accurately or at native-level.

Good example. You're talking with a large set of mostly well-educated native English speakers. You could play the student; even if you were a native speaker, you would not always be perfect. Instead you argue with everyone else, treating it as merely insults.

993anglemark
Apr 26, 2022, 2:57 am

>992 prosfilaes: I (Johan here, for once) have a B. A. in English, I started learning English in 1973, I have spent months in English-speaking countries, I write in English as my day job. I learn things from my university-educated native-speaking colleagues and friends every day, and lap it up with gratitude, because it improves my proficiency in English.

994Petroglyph
Modificato: Apr 26, 2022, 12:52 pm

>989 faktorovich:
There's that peculiar mix of moving the goalposts, obsession with spelling, self-assuredness, and (to borrow lorax' apt expressin in >732 lorax:), fractal wrongness.

I, for one, am not your teacher, and I'm not being paid to teach you remedial grammar. So I won't.

>990 faktorovich:
Because I'm nice I'll assume misunderstanding rather than malice.

This post misses the point of the post it was a reply to, which is this: the generic dictionary definition (i.e. the denotational semantics) with which you substitute people's actual words in context, are insufficient to capture their relevant import. That often lies in connotations, semantic prosody, pragmatic inferences (such as that power imbalance) that a dictionary substitution fails to capture. Some people have a hard time dealing with non-denotational meanings, but they are central to communication.

True synonyms are vanishingly rare: two words that have the exact same denotation and connotation / social semantics / pragmatics / etc. Substituting one word for one in the same semantic neighborhood, as you are wont to do, all but guarantees misunderstandings.

In limiting yourself to "dictionary-based understanding of the terms", you're liable to ignore the actual points that people are making. In fact, I suspect that much of the confusion and nonsensicality you've experienced in this thread are due to this practice of yours to remove individual words embedded in a rich inferential context and to substitute them with the bare denotation of an ill-fitting replacement. Doing so may seem like a logical and straightforward and direct "find-and-replace" operation to you, which is why you do it so often, but it is not how other people think or talk.

Or perhaps it's something else. By the way you've ended up your post (implying I'm immature because I feel too superior to be placed in the role as your student; presenting yourself as the real misunderstood party), I am maybe allowed to conclude that "winning" is why you keep doing this. Sad.

995Petroglyph
Apr 26, 2022, 12:45 pm

>991 faktorovich:
Well, this is an on-brand yet unexpectedly intense mess of unwarranted vitriol to be lobbed at me gushing about the wonderfulness of a set of learning tools. A post that wasn't even for you, but a response to Keeline.

Pity that so many of the foundational assumptions of #991 are not true.

You're confused about predicative and attributive use of adjectives. Reread your post where you talk about "X thing is true" (the predicative use) and see if that dictionary I was paraphrasing claims "approval" as a connotation for "he said it was true" (predicative) or for "a true genius" (attributive). See if that alters the thrust of your accusations of wrongness and nonsensicality.

I wrote "you say that things are true (i.e. predicative) when you want to mean "fact-based", but when you say that people or things are true NOUN, you are typically expressing approval rather than factuality" (emphasis added). You turned that into the black-and-white statement "The concluding remark is that when one states "people or things are true"; "true" in this context can be substituted with "approval", but not with "factuality"; this is absolutely false".

Of course that's absolutely false. You've twisted my words until they become absolutely false -- not for the first time, either.

a couple of nativist insults that claim that no author who was not born speaking English can write in English accurately or at native-level.

Are you assuming my L1 is English? It is not. It's my second language. You're also seeing insults where there are none. You're also also putting unpleasant and untrue words in my mouth. Again.

if you started learning to read English in first grade; we are not far apart in this regard

What the hell? Now you're just making up my schooling history? Have you got me confused with someone else? Or is this another hasty assumption that's absolutely incorrect?

all languages are interconnected

Meh, who knows? Certainly not you. You cannot know, in fact, because nobody knows. It's unlikely we'll ever know (barring the invention of time-travel).

"Foreigners can't speak English good! No foreigner is as good at English as I"

You're assuming I've said or implied this. I don't think I have. Your martyrdom is imaginary. Or maybe you're confusing me with someone else. I interact almost daily with L2 speakers of English who speak it better than I do.

"since I am a native-Englisher!."

Again, I am not. English is my L2.

996Petroglyph
Apr 26, 2022, 12:50 pm

So I thought I could post another Lunch Break Experiment (tm). Do let me know if you think these are unproductive additions to a thread that has mostly run its course, and I'll stop posting them here and just keep them to myself. (I've been throwing a few impromptu corpora at stylometric software, and I have enough notes and quick little one-off tests to fill a few more posts.)

Today I want to play around a little with the works of James Joyce.

I don't have any particular goal in mind here -- this post contains some exploratory tinkering with R:Stylo involving a tiny dataset consisting of James Joyce's works. Far, far upthread the possibility was raised that Joyce's famously eclectic style would be a good test case for stylometric analysis, and given that no-one has taken up the challenge, I thought that I would look into the matter a little myself. And then I realized that I could polish my notes a little bit and organize some graphs and post them here.

Right. Let's get on with it.

From Project Gutenberg I downloaded four of J.A.A. Joyce's works: Dubliners (1914), A Portrait of the Artist as a Young Man (1916), Exiles (1918) and Ulysses (1922). (His full name was James Augustine Aloysius Joyce, which is Tolkienesque in a very Bandersnatch Cummerbund kind of way). Finnegan's Wake is still under copyright, and I don't own a digital copy of that book, so no dice. I cleaned the .txt files by removing all ProjGut legal boilerplate, forewords, dedications, tables of contents, etc.

The corpus isn't quite homogeneous: Exiles is a play (which can be expected to display different vocabulary patterns than narrative fiction), and Dubliners is a short story collection, not a novel. We're also going to need to add more authors in the mix, but we'll do that later.

To begin with, let's try some exploratory graphs with just the works by Joyce, just to see how the software judges their internal relationships.

Here's a scatterplot of all four works (1000 most frequent words (MFW); covariance matrix). Open in a new tab to embiggen.

The X-axis which covers 86% of the differences between these texts, is mainly about separating Exiles from the other three -- which makes a lot of sense: it is the only text that is nearly all dialogue. Of the others, Portrait and Ulysses are more similar to each other than they are to Dubliners.

Upthread, it was hypothesized (by me, I think) that the individual chapters of Ulysses, written as they are in very different styles, might be attributed to different authors when subjected to stylometric analysis. Turns out, maybe? Kinda? Ish? I've never read Ulysses, but someday I will, and then perhaps I'll be able to say something cogent about the results below.

Here's a scatterplot of the individual chapters of Ulysses (cosine delta). Again: open in new tab to embiggen.

Some 32% of the variation is covered by the horizontal axis. It seems to be about separating chapter 17 from the rest. It appears that chapter 17 is written in an artificially formal style with lots of over-the-top sciency talk that deliberately covers a plethora of different subjects in a Q&A format. That could certainly be an explanation for why it is judged as an outlier 1000 MFW-wise! A few other chapters seem to be trailing in that direction as well (12, 14, 16). As for the vertical axis: there's a small cluster made up of chapters 13 and 18 in the bottom left; and the rest clusters together in the top left. But the Y-axis difference covers 14.8% of the variance, so I'm not sure how different these chapters really are from the rest.

Conclusion: there's a few chapters that look like real outliers. It's not like the novel is all over the place all of the time, 1000 MFW-wise. but there is considerable variation across chapters. Still: much of the book seems to be relatively coherent, as far as vocab goes. Or perhaps Joyce's attempts to use different styles were no match for the computational juggernaut that is 1000 MFW. If so, the same scatterplot for 100 MFW (i.e. a greater proportion of function words) would show more difference. Turns out, not really:

I'd have to run a function-word test, or a punctuation test perhaps to take a more granular look at this. Perhaps for a future Lunch Break Experiment (tm).

For this next graph, I've plonked both Dubliners and Portrait in the corpus together with the individual chapters from Ulysses.

In this type of graph (as in the others in this post), distance equals difference. The longer the lines are between individual texts or between subclusters of texts, the greater the differences are between them. That said, Portrait and Dubliners are clearly much closer to each other than they are to any of the Ulysses chapters, by quite some margin. Chapters 13 and 18 are again a distinct subcluster; 17 is unpaired, but closest to other semi-outlier chapters 14 and 16. There's a blob of fairly similar green chapters in the middle.

Note how, in the first graphs above, there was quite a distance between Portrait and Dubliners, and here they're quite closely together. What has changed? Well, the way we're treating Ulysses has changed. The individual chapters of that book show a lot of internal diversity, which gets averaged out when looked at the novel in its entirety. When separated out like that, the individual chapters and their diversity change what the baseline is for the distances between other parts of the corpus. Overall, while the entirety of Portrait is more similar to the entirety of Ulysses, Portrait and Dubliners are more similar to each other than they are to any chapter of Ulysses.

I remember from reading Portrait that there's a massive sermon in the middle of that novel that amounts to 12% of the entire novel, as measured by my ereader. And there's long monologues about art and criticism towards the end, as well, IIRC, which is un-novel-like. So it may be the case that there's some internal diversity within Portrait, too -- making it more similar to the eclectic Ulysses than to the fairly coherent Dubliners)

Whatever the case may be, it is important to remember that these graphs measure distances not in any absolute sense, but always with respect to the particular corpus you've run the test on. The corpus of full-text J.A.A. Joyce fiction has a different profile than a corpus where the eclectic nature of Ulysses is brought to the fore and where the outlier that is Exiles is out of the picture.

What happens if you ask the software to calculate the distances between the individual chapters of Ulysses, the individual stories in Dubliners, and Portrait? Here's a cluster graph:

At this small scale, Ulysses is consistently different from Dubliners, and Portrait fits comfortably with the former. The diversity within Ulysses is much greater than that within Dubliners: for the former it takes a greater distance to get from one individual chapter to another, and the lines between individual subclusters are much longer, too. In other words: when Dubliners, a fairly coherent short story collection with a fairly uniform style, is allowed to spread out its coherence across multiple texts (i.e. the fifteen individual stories), the baseline for distances between the other texts in this particular graph has changed again: the software now judges the main separation to be one between the coherent Dubliners on the one hand, and the more eclectic Ulysses and Portrait. Still, several of the stories in Dubliners are short -- below the point where stylometric analysis on the individual stories becomes somewhat reliable, so that's as far as I'll take that.

I can't be bothered right now to break up Portrait into its component chapters and plot them against Ulysses and Dubliners. Maybe some other time.

Alright. So much for Joyce's works on their own. Let's throw in some other authors -- the same ones I used for my Lunch Break Experiment (tm) in >252 Petroglyph: Jane Austen, the three Brontës, and Marie Corelli. I also threw in Charles Stross' SF short story collection (freely available here), just to add in a legitimate outlier (that sometimes helps clarify patterns elsewhere).

Using multidimensional scaling (MDS), this is how R:Stylo judges the distances between all of these texts. I've also added a cluster analysis -- these two tests measure slightly different things. Open in a new tab for larger versions:

Almost all authors form their own little cluster: Austen in green, Anne Bronte in red, Charlotte Bronte in blue, Corelli in black, and Joyce in purple. Emily Bronte's Wuthering Heights in orange is judged to be close to Corelli's work; Stross' Accelerando in grey is judged to be more similar to Joyce's Exiles than to any other work in this corpus.

Does this mean that Corelli's works and Wuthering Heights or Accelerando and Exiles were written by the same person? Of course not! We've just haphazardy thrown a more or less random bunch of texts together and told the software to connect them. And so the software does what it's told. Of course weird connections are going to be made! This is what it looks like when the incoherent and unprincipled nature of this ridiculous corpus bites us in the bum.

Put differently: Stross's odd-one-out book is merely judged to be more similar to Joyce's odd-one-out text than to any of the other works in this corpus, which, given the genres and the timeframe, is not all that surprising. Similarly, Corelli's artificially archaic romances from the 1890s might well fit in with the language and style of the dramatic Wuthering that came half a century before them, and not all that close to the Austen novels, which were written a century earlier. The software isn't saying that Wuthering was written by the same person who wrote the three Corelli books; it's merely saying these are most similar when compared to the other texts in this particular corpus. (I wonder what would happen to this similarity when compared to a more coherent/representative corpus of other mid-nineteenth-century romances. Did Corelli, mayhap, crib from the Brontës? Another time, perhaps.)

In that cluster graph to the right, there's not much to comment on. Except, perhaps, that the three most recent authors, Corelli, Joyce and Stross, are grouped together (Looks like Corelli isn't archaic all the time), before the Brontë branch joins them, and finally the Austen branch. There's a clear chronological dimension here. The Brontës, as usual, form a tight cluster with short distances between the individual works.

For these next graphs, I removed both outliers (Accelerando and Exiles) from further consideration. Instead, I broke up Ulysses into its component chapters again:

This MDS plot on the left is mainly about separating Joyce's works (in purple) from everyone else -- clearly Joyce's works are really very different from the other authors in this corpus (who, nonetheless, still form their own clearly delineated subclusters!).

Since these two graphs are the result of slightly different measurements: Wuthering in the MDS is closest to Anne Brontë's work; and it is Charlotte Brontë whose works are judged to be most similar to Corelli's. In the cluster graph, however, (just like in the previous cluster graph) it is the three Brontës who are closest together, 1000 MFW-wise, before Corelli's works join their branch. Again, E Brontë and A Brontë are more similar. But the distances among the Brontës are pretty small overall, here as in the other graphs.

Conclusions

I can think of only two properly serious sciency conclusions that have been kinda sorta illustrated by this tinkering of mine:

I guess I found out that some of the chapters in Ulysses are very different from the rest of that book, and indeed from the rest of J.A.A. Joyce's oeuvre, which, from the looks of it, is fairly coherent otherwise (as 1000 MFW reckons things, at least). Within the inadequate confines of the present corpus, that diversity doesn't quite rise to the level of "I can't believe it's Joyce!", but they're pretty divergent.
Not surprising in the least: the representativeness of your corpus matters. It matters a lot -- to the point of making or breaking your results. Finding out that Joyce is very different from a range of 19th-century authors who wrote at least three quarters of a century earlier (or who attempt to write like they belonged in a different era) is no real conclusion (A mild form of "Garbage in, garbage out"). With a corpus consisting of authors and novels that are closer to Joyce's style I could perhaps find out whether the diversity within Ulysses is something that other works or authors from that era have in common, or whether Joyce is more or less unique in this respect. Who knows -- the diversity within Ulysses may even seem much greater!

Right. That's about all I have time for today. But this little project has whetted my appetite for some more serious tinkering with the works of J.A.A. Joyce (my interest has been piqued, one might say, though it has not yet peaked), and then I will prepare a more serious, a more representative corpus featuring other experimental, introspective modernists from the 1910s and 1920s. Woolf is an obvious candidate. D.H. Lawrence, perhaps? Conrad? Faulkner? Mansfield? Djuna Barnes? Ronald Firbank? I'll have to look around for a few more.

I also realised the other day that my partner and I own many of the books by Iain (M.) Banks, most as digital copies, and probably enough to run a comparison between the two bodies of work. I'll try and see if I can get clean txt-versions from our ereaders.

997Petroglyph
Apr 26, 2022, 1:12 pm

>993 anglemark:

I work at a university with many L1 English speakers (as well as L2, L3 etc), so I'm constantly pelted with nice little usage patterns that are just right there for me to observe and incorporate into my own idiolect. It's wonderful!

998Stevil2001
Apr 26, 2022, 1:13 pm

I was going back through some of the high points of this thread, and reread Petroglyph's excellent demonstration of stylometric analysis using the Baum/Thompson Oz novels. Suddenly this apocryphal, unknown fragment popped into my head: http://newwwoz.blogspot.com/2010/07/baums-last-oz-story.html

Supposedly Baum wrote it, but many (including me) doubt this. I wondered if stylometric analysis might tell us something about it, or if it was too short.

999faktorovich
Apr 26, 2022, 1:20 pm

>992 prosfilaes: The discussion regarding "student"/"teacher" in this thread began with Petroglyph's statement: "I am not your daddy, your mummy, your teacher, your mentor, your colleague, your student, your editor, your pet-sitter, your sassy gay friend, *or* your weed dealer." I do not know why you guys keep returning to these dualities of "teacher"/"student", as such questions are irrelevant from my perspective of conducting and sharing my research. You guys have also been asking why I am here responding to questions, and I have explained that I am volunteering answers to all questions that are raised here about my research, to assist all who visit with being able to understand it. Since this thread is about the 14 volumes I have published in my BRRAM series that re-attributed the Renaissance; I am technically the "teacher" who is being asked questions about these revolutionary findings. None of you have brought in any new findings of your own, nor any rational reasons to contradict my findings. So if any of you are trying to teach something, it appears to be mostly about xenophobia and your nativism.

1000lilithcat
Apr 26, 2022, 1:25 pm

Only because I want to see post 1000!

1001Petroglyph
Apr 26, 2022, 1:36 pm

>1000 lilithcat:
Nice one! Congrats!

1002faktorovich
Apr 26, 2022, 1:41 pm

>994 Petroglyph: Imagine if people were communicating only in emojis. Let's say, I wanted to explain something about Renaissance linguistics and how the emojis in those days differed. The "praying hands" emoji back then was in a slightly different color, and meant specifically the practice of praying to the Christian God. I could show the Renaissance version of the emoji, but it would be difficult for me to express how the meaning of this emoji changed over time with other emojis (at least not without drawing some new emojis of my own for specificity). Thankfully we are not limited by being forced to only use emojis in our communication system. The denotations or dictionary definitions for words help us all explore what precisely a user is trying to say when they use specialized language, or appear to be using words in an unusual way. Connotations are important for coloring a conversation with layers of implied or suggested meaning, but the interpretation of connotations are intended to be open-ended; it is up to each individual reader if they choose to think of a potential metaphor, symbolism, specialized terminology that is not yet in dictionaries, and the like. A writer thus should not rely on all readers reaching the same connotation-based understanding of a written text. The pure denotation meaning is of the primal importance, and so writers have to check that the dictionary-meaning of what they are saying matches what they were attempting to communicate. If what somebody is saying becomes nonsensical merely by substituting in dictionary definitions or synonyms of even a single word in a sentence; then, the problem is with the author's misuse of the term in a manner counter to the dictionary definition. Reading is not a process of mind-reading or psychically connecting with the author's unstated intention, but rather the process of taking in the surface meaning, or dissecting the details of the words for the precise imbedded meaning.

1003Petroglyph
Apr 26, 2022, 1:47 pm

>998 Stevil2001:

Hmmm. Only 1169 words. That is below the threshold for reliable results, but I'll give it a try.

Here's a scatter plot of the Baum part of the corpus from my earlier Oz posts. (300MFW; open in new tab to embiggen)

Likely not by Baum then.

I can't be bothered to compile a corpus for other known contributors to the Oz canon to find out who it is by (a commenter on that post you linked suggested Kenneth Gage Baum). But I do have a corpus for Thompson. So let's take a look at that:

Nope, that fragment is not likely to be by Thompson either.

I also tested these for 100 MFW (i.e. more function words), with the same results.

1004faktorovich
Apr 26, 2022, 2:03 pm

>995 Petroglyph: After I read the grammatical post you linked to, it became clearer what this statement was trying to state: "you say that things are true (i.e. predicative) when you want to mean 'fact-based', but when you say that people or things are true NOUN, you are typically expressing approval rather than factuality." However, the original statement had not linked to a page describing the predicative vs. attributive usage, and it had not even mentioned attributive at all. There was no parallel construction between the first version of the statement and the second to explain what grammatical rule the author was stressing. Here is how a rational version of this statement should have looked with a parallel construction. Predicative: "The statement is true (factual)." Attributive: "You are a truthful (approval) girl." Conclusion: Do not confuse these usages. With the grammatical rule clearly stated in this manner, it is apparent that I have never made the error of confusing these usages, as I have been finding falsehoods in specific statements, and only if most of the statements are false have I been concluding if the speaker is overall being untruthful.

We know all languages are interconnected because if they were not some humans with have entirely different forms of communication such as beating out rhythmic patterns, or making patterns of facial expressions. The development of letters, and sounds to communicate was a technology that was shared between migrating humans, and so language is interconnected at these roots.

1005anglemark
Modificato: Apr 26, 2022, 2:53 pm

>996 Petroglyph: Oh excellent, more lunch break experiments! Thank you!

Edited to add: Gertrude Stein, perhaps?

-Linnéa

1006faktorovich
Apr 26, 2022, 2:51 pm

>996 Petroglyph: There is no need for me to duplicate and test this entire experiment to prove it uses an erroneous method, as I have already demonstrated this in the previous Lunch experiment. If you open the Excel file I uploaded to GitHub for my Lunch experiment, you can scroll over to 3-word phrases columns as you try to understand what the experiment's conclusions are beyond the purely quantitative graphs. For example, you can search for the phrase "I am sure", and you will notice that it appears among the top-6 phrases only in all six of the "Austen"-bylined texts, and in none of the otherwise bylined texts. Then, when you check the diagram for the "Data Summary", you will notice that there is a giant gray box of matches between all of Austen's texts. Thus, the phrases-test confirms the combined results of the 27-tests. And the matches between phrases are extremely unlikely to be coincidental. To check this, you can look at the 284-texts data for my Renaissance experiment on GitHub; you will notice how rare shared phrases are even when a corpus only has the texts of six ghostwriters. Austen's preference to overuse "I am sure" is very unusual and thus is a character-trait that alone should help to intuitively locate any texts she might have ghostwritten. The data for each of these individual tests is preserved in my data-sets because it similarly helps to trace the many unique elements of a writer's linguistic pattern. These details help to create a rich conclusion that relies on many types of evidence to support a conclusion. Instead, you are just feeding these texts into a tool that returns the conclusions that position texts on divergence diagrams, without giving any details on what elements actually led to these conclusions. Without these details, you are forced to imagine if one chapter is a sermon, while another is more digressive etc. You are guessing why they might have registered as different or similar without being able to figure out how they precisely diverged in their data-points. I took an entire PhD class on "Ulysses", where I read all of it very closely. One of your mistakes: there are no chapters in Joyce's original version of "Ulysses" (the breaks in Gutenberg were added by a later editor); the absence of chapter-breaks is one part of the experiment Joyce was conducting (it is a cyclical story without a beginning or end that digresses endlessly without caring where it's going; according to his biography, Joyce was suffering from extreme alcoholism while he was writing "Ulysses" and he happened to have found a publisher who did not care if he did not edit/ or control his writing-bowels in any way). The separation into books would have been the printer's prerogative due to the whole not fitting in a single volume. All that aside, your interpretation of the data continues to be dominated by your biased belief in the authenticity of bylines such as "X Bronte", over the data telling you that at least two of the "Brontes" repeatedly cluster more like "chapters" out of a single book (such as "Ulysses") than as unique authorial signatures would if there were three unique authors between the three "Brontes". Even if I perform this full test will all of the texts you are proposing, I would yet again present overwhelming evidence of the accurate quantitative attributions for this corpus, and you would counter my findings by stating that you simply know that the bylines are true and cannot be contradicted.

1007faktorovich
Apr 26, 2022, 2:56 pm

>998 Stevil2001: It is not so short that it cannot be tested with computational-linguistics. "Baum Bugle" was published in around 1950s/60s, so it is unlikely that all of the relevant texts would be in the public domain, and thus testing this corpus would present technical difficulties.

1008Keeline
Apr 26, 2022, 2:57 pm

>1003 Petroglyph:

Frank Joslyn Baum (December 3, 1883 – December 2, 1958) wrote The Laughing Dragon of Oz as an unauthorized Oz story as a Whitman Big Little Book. This was first published in 1934. It was renewed though so an electronic text may not be readily available. The BLBs have very little text since half of the small pages are comic strip style illustrations.

It is someone to look at if we can get a text for analysis.

James

1009Petroglyph
Apr 26, 2022, 4:12 pm

>1008 Keeline:
Scrolling through the dozens of different authors who wrote one or more Oz books, it looks like it'd be relatively easy to compile a subset with just those authors that could have conceivably written a fragment first published in 1965/66. Acquiring the texts, as you correctly note, is the hard part.

I'd still be willing to look into it, but only if someone pays me and assists me in securing machine-readable texts!

1010Petroglyph
Apr 26, 2022, 4:13 pm

>998 Stevil2001:
Don't forget that time when you became the main character around the mid-800s!

1011Petroglyph
Apr 26, 2022, 6:42 pm

>1004 faktorovich:

On an unrelated note: if I ever do a talk about improper argumentation techniques such as DARVO, could I use this post as an example? I'll provide full attribution, obviously.

(I'm not even going to touch the r/woosh mess that is this bit, else we'd be here all night:)

Predicative: "The statement is true (factual)." Attributive: "You are a truthful (appoval) girl."

1012prosfilaes
Modificato: Apr 26, 2022, 7:02 pm

I actually played around with Stylo a bit as well.

That's the first issue of Weird Tales, from Wikisource. I was looking for the author of "The Young Man Who Wanted to Die", who is named as "? ? ?" in the magazine. I clearly don't have enough data, as I only have one known work of each author, even if I did include the second part of Otis Adelbert Kline's novelette from issue two as a separate work. Assuming it's correct, if the author wrote two works for the issue, it could be David R. Solomon; since all I can find about him is that he's credited for the one work, that doesn't help. Farnsworth Wright and Kline and Rud are the more prolific authors in the set, but none of them are close enough to really justify transcribing more of their works.

(Yeah, more Weird Tales is in the public domain, but no more full issues have been transcribed that I've seen. I could add stories from Lovecraft and Clark Ashton Smith and Robert E. Howard, but it doesn't match any of their styles.)

1013Petroglyph
Apr 26, 2022, 7:06 pm

>1002 faktorovich:

So you are aware that denotation is merely one portion of lexical semantics (and indeed, sentence semantics, paragraph semantics, etc.). You have made a deliberate choice to stay away from non-denotational meaning in favour of easy-to-search dictionary definitions. Gotcha.

TL;DR Faktorovich, self-proclaimed "literature scholar" who studies Early Modern texts, expounds her assumptions on how dictionaries figure in the writing process, how writers relate to their audience, and how reading ought to centre on "the surface meaning".

(On a completely unrelated note: search this thread for the word "unbiased".)

the interpretation of connotations are intended to be open-ended; it is up to each individual reader if they choose to think of a potential metaphor, symbolism, specialized terminology that is not yet in dictionaries, and the like.

Your view of the role of dictionaries in people's writing is... unlike that of professional linguists. Let's leave it at that.

A writer thus should not rely on all readers reaching the same connotation-based understanding of a written text.

You know, one of the many, many difficulties of reading texts from societies and cultures and such that no longer exist is, indeed, figuring out what they meant, in lots of non-denotational ways (connotationally, contextually, pragmatically, socially, ...). There's plenty of texts that made perfect sense in context and whose denotation we get, but that today are cryptic: examples include Sumerian jokes, or some of the graffiti in Herculaneum and Pompeii, or even a few Viking-era runic post-its . It's probably asking too much of a "literature scholar" who self-publishes dozens of books on Early Modern texts to be sensitive to non-denotational meaning. Or, you know, the difficulties of reading a text whose primary audience no longer exists, without imposing 21st-century ideas onto a 17th-century horizon.

The pure denotation meaning is of the primal importance

Citation needed. Did Fletcher and Ben Jonson think so, too? Are you confusing ease-of-looking-up with primary meaning? Are you confusing easy-either-or-decisions with truth?

and so writers have to check that the dictionary-meaning of what they are saying matches what they were attempting to communicate.

Citation needed. And again: that view of the role of dictionaries in people's writing is... unlike that of professional linguists. Let's leave it at that.

If what somebody is saying becomes nonsensical merely by substituting in dictionary definitions or synonyms of even a single word in a sentence; then, the problem is with the author's misuse of the term in a manner counter to the dictionary definition.

Citation fucking needed. (Also, semicolon misuse). Also, you have some weird and pre-theoretical and unfounded and curiously naïve views on dictionaries. Oops, did I say that out loud? I meant: "unlike that of professional linguists".

Reading is not a process of mind-reading or psychically connecting with the author's unstated intention, but rather the process of taking in the surface meaning, or dissecting the details of the words for the precise imbedded meaning.

False dichotomy (either surface meaning, or else psychic powers).

Reading is {...} the process of taking in the surface meaning, or dissecting the details of the words for the precise imbedded meaning

Citation. Fucking. Needed.

1014faktorovich
Apr 26, 2022, 8:48 pm

>1008 Keeline: If a text has comic illustrations; it is safe to assume it was ghostwritten by a regular children's book staff-editor. Children's book must be precisely formulaic to fit school board rules. The structural bias of these rules are now especially apparent as they also have to pass the don't-say-gay-and-race standards in many states. Even if a children's book is not purchased by thousands of schools, if it gets on a reading-list; children's books are made by only a few publishers who limit entry into this marketplace from independent publishers. For these and many other reasons, there is no need to test any children's book to know it was cooked in a lab.

1015faktorovich
Apr 26, 2022, 8:50 pm

>1009 Petroglyph: If you wait for somebody to pay you to test a children's book authorship; the only entity probably likely to do so are children's book publishers themselves who would probably want to pay for the result that their books are authentic, and any elsewhere published versions are encroachments on their copyrights. Performing computational-linguistic analysis thus for-hire is as biased as performing cancer-experiments for a cigarette manufacturer.

1016faktorovich
Apr 26, 2022, 8:56 pm

>1011 Petroglyph: One improper argumentation technique is to quote a quote within a quote of a quote and to claim it is an example of an error without explaining just where the error is and who made it. The "Predicative..." line is me writing out what another writer in this thread was trying to say, but failed because the statement was not put in a parallel construction with logical indicators. It is a general rule of copyrights that you can quote any short fragments out of any text without asking the person you are quoting for permission (especially if the text is made in a public forum or has been published). The rules restrict the quantity you can quote, but up to around a paragraph out of a book is fair-use.

1017faktorovich
Apr 26, 2022, 9:05 pm

>1012 prosfilaes: When you create a diagram and use the same colors for different texts, there has to be some logic behind your color-choice; for example, the orange texts might be under the byline of author-X, while the green texts are assigned to author-Y. There is no such logic in your diagram. What this mass of titles shows is that this whole issue of "Weird Tales" was probably ghostwritten by a single hand as the titles form close clusters that perhaps semi-split into two branches (so perhaps 2 hands). You even point out that at least one of these bylines did not appear on any other texts (a strong indicator of a use of a pseudonym). And you don't even make any concluding remark as to what you imagine this diagram signifies regarding the authorship attributions.

1018Keeline
Apr 26, 2022, 9:15 pm

>1009 Petroglyph:

Interestingly, some of the later works, including about half of the Thompsons, all of the Jack Snows, but none of the John R. Neill stories failed to have their copyrights renewed so they are very findable ( https://gutenberg.org/ebooks/search/?query=Oz ). These will be easiest to process as you have already experienced.

This list is more accurate than the regular Wikipedia entry.

https://oz.fandom.com/wiki/Copyright

James

1019faktorovich
Apr 26, 2022, 9:24 pm

>1013 Petroglyph: First, I thought your opening statements in this post were nonsensical; then, I looked up "TL;DR is an abbreviation for 'too long; didn't read'". Thus, I came to understand that you were saying a bunch of nonsense because you did not read any of the post of mine that you were responding to.

Precisely because the writer of a 17th century text no longer exists and the language has become archaic, a translator has to make sure the modernized text will be understood on a denotation-level without asking modern readers to guess my personal connotational implications; if there were connotational implications in the original text, I explain these in detailed annotations.

"The pure denotation meaning is of the primal importance": this is my own original statement; thus, there is no citation I can give, since this is my own understanding of language usage. "Fletcher" is a pseudonym that Jonson used; so it is repetitive to refer to both "Fletcher" and "Jonson". "Primal" means "essential" and not "easy to look up"; you have again digressed into nonsense.

Then, you keep repeating that every opinion about language that I express must be cited. How can you object to a lack of citation in these abstract cases, while also criticizing me for citing the dictionary definitions for words that you guys are misusing?

Then, you add several insults and slang while attempting to argue that I am not a "professional linguist". At least I am not breaking out into a verbal-vomit of insults.

What I said was an accurate dichotomy as interpreting the surface meaning is indeed different from psychically guessing the intended meaning beyond what is obvious on the surface.

I have published enough scholarly books and articles to be able to form my own conclusions about language usage without citing other scholars.

1020paradoxosalpha
Apr 26, 2022, 9:37 pm

>1019 faktorovich:

TL;DR is a pretty ancient expression in 'net time, and is typically used to point out a stylistic failing from which you frequently suffer: the extended wall of text that offers no paragraph breaks or other visual relief to make it welcoming to the reader. It's almost as if you don't care whether anyone reads what you have written.

1021Keeline
Apr 26, 2022, 9:50 pm

>1014 faktorovich:

You are expressing a complete lack of awareness of this text or the formats I am speaking about. Maybe try a Google search for the title next time?

https://oz.fandom.com/wiki/The_Laughing_Dragon_of_Oz

This unsold auction listing shows a page spread of the interior which is what I was trying to describe:

https://www.pbagalleries.com/m/lot-details/index/catalog/376/lot/116246/The-Laug...

This story was written by the son of L. Frank Baum, Frank Joslyn Baum, without permission, for Whitman. It created a schism in the Baum family when his mother, Maud Gage Baum, sued him and the publisher. The family won. Frank J. Baum was the first president of the International Wizard of Oz Club.

I merely stated that comparisons with other Oz authors should include him since he long had an association with the family franchise.

James

1022Petroglyph
Apr 26, 2022, 9:59 pm

>1019 faktorovich:
>1020 paradoxosalpha:

It also illustrates another of Faktorovich's traits: careless reading leads to "brisk impressions" which are promptly confidently and peremptorily uttered with authority.

A TL;DR is often also offered as a quick summary of a (much) longer post, as a service to lurkers / low-effort participants, or to help potential readers decide whether or not they want to invest their time in the actual post.

This was the spirit in which I prefaced my fairly tedious post #113 with a TL;DR.

>1019 faktorovich: Thus, I came to understand that you were saying a bunch of nonsense because you did not read any of the post of mine that you were responding to

And I'll quickly add this trait of hers, too: her "brisk impressions" often involve assuming bad faith (broadly speaking) in her critics.

>1020 paradoxosalpha: It's almost as if you don't care whether anyone reads what you have written.

I'd bet money that she'd ascribe that to her audience's lack of willingness and/or effort to partake of her teachings. The blame cannot lie with her; the responsibilitiy lies elsewhere.

1023Crypto-Willobie
Apr 26, 2022, 10:02 pm

>1000 lilithcat:
Darn! I was watching for #1000 but I was away from my laptop too long!

1024lilithcat
Apr 26, 2022, 10:05 pm

>1023 Crypto-Willobie:

Keep an eye out for 1500.

1025Stevil2001
Modificato: Apr 26, 2022, 10:33 pm

>1003 Petroglyph: Thank you kindly! That is interesting, and I appreciate seeing my impressions backed up by the data. I've never read the Frank J. Baum Oz book (I guess few have) so I have little sense of if he could be the author.

>1010 Petroglyph: Truly a bizarre occurrence.

Delighted to see we've reached a thousand posts. This has been one of my favorite threads on LT. Maybe the real writers of the works of the British Renaissance were the friends we made along the way?

1026prosfilaes
Apr 26, 2022, 10:40 pm

>1017 faktorovich: What this mass of titles shows is that this whole issue of "Weird Tales" was probably ghostwritten by a single hand as the titles form close clusters that perhaps semi-split into two branches (so perhaps 2 hands).

It says nothing of the sort; it would have taken any set of works and turned them into a tree.

You even point out that at least one of these bylines did not appear on any other texts (a strong indicator of a use of a pseudonym).

One of the other stories was by F. Georgia Stroup, who had just this publication, but a researcher traced her down and found her whole life. Having heard stories about the depth of the slush pile and how poorly many of these magazines paid, I suspect that most of the people with just one credit are people who tried their hand at writing but didn't find it fulfilling enough to keep working at it.

And you don't even make any concluding remark as to what you imagine this diagram signifies regarding the authorship attributions.

That wasn't really the point. This wasn't a research paper; this was a brief comment about something I had done that wasn't terribly successful. As I said, I lacked enough data to get good results, and the results I did get were unhelpful; instead of being grouped with someone I could get more data to test with, it was grouped with a work I knew nothing about. I was hoping and somewhat expecting the editor Farnsworth Wright, who was known to write anonymously and may have needed to fill a hole in the first issue. This is at least evidence against that.

1027Petroglyph
Apr 26, 2022, 10:43 pm

>1019 faktorovich: How can you object to a lack of citation in these abstract cases

Alright, I guess I owe you an explanation of this particular piece of internet slang: citation needed is a short comment on claims that are perceived as outlandish, poorly supported, or just plain dubious in general. It means "I cannot trust this to be true". Alternatively "Dis sum bullshit".

See? By disregarding non-denotational meaning you've completely missed the point of those "citation needed" comments in >113 Petroglyph:. You've taken them at face value, their surface meaning ("give me specific page numbers"). But anyone who shares in the relevant cultural background would have understood the actual, intended meaning.

Now imagine some 1600s references that you're just not getting. Imagine Renaissance in-joke that you're missing. Even if it's only one per book. Or one per act. Just imagine...

while also criticizing me for citing the dictionary definitions for words that you guys are misusing?

I can't speak for others, but I'm not criticizing you for merely "citing" dictionary definitions. I'm criticizing you for inappropriately substituting them for words in contexts where your substitutions miss so much non-denotational meaning that the sentences / claims become nonsensical. Then, after having made the sentence nonsensical yourself with an inappropriate substitution with an ill-fitting replacement, you declare that this deformed version is an accurate semantic equivalent of what the original poster meant to say. You go on to proclaim the original sentence nonsensical, and end with pretending you can now safely dismiss the original sentence as nonsense. That is the crux of this particular matter.

1028Petroglyph
Apr 26, 2022, 10:46 pm

>1012 prosfilaes:

Nicely done!

It'd be good if you could get some other texts by these authors, especially Solomon. Do you have any idea of the average word count for these texts?

1029prosfilaes
Apr 26, 2022, 11:05 pm

>1028 Petroglyph: They range from 1000 to 8000 words, averaging 4000. I have raw scans for Weird Tales, but no transcriptions except for the really big people. As for Solomon, it's possible he was the New York druggist who showed up in the 1940 Census; ISFDB doesn't have any other works, nor does Worldcat. I really think I'm out of luck for him.

1030Petroglyph
Apr 26, 2022, 11:11 pm

>1018 Keeline:
>1025 Stevil2001:

I quickly pulled these texts from PG (and removed the legalese, title pages, tables of contents, etc. etc.):

Jack Snow: The magical mimics in Oz & The shaggy man of Oz
Robert J. Evans: Dorothy's mystical adventures in Oz
Robert J. Evans & Chris Dulabone: Abducted to Oz & The forest monster of Oz

That brief fragment appears to not fit with any of these, either, as it sits all alone in the bottom right corner.

1031Keeline
Apr 27, 2022, 11:01 am

>1012 prosfilaes:

On some issues they have full text for issues. I don't know if they have been proofread and corrected for the usual OCR-generated problems. But maybe you can find some more Weird Tales content to analyze here.

https://archive.org/search.php?query=%22Weird+Tales%22&sort=date&and=med...

James

1032faktorovich
Apr 27, 2022, 1:18 pm

>1022 Petroglyph: All of you are consistently attempting to summarize my personality and "traits". There is no room for personality in a scholarly discussion. Focus on the meaning of what I am stating, and not on if my criticism is too bossy for a female author.

1033faktorovich
Apr 27, 2022, 1:32 pm

>1026 prosfilaes: If your attribution method summarizes the data by forming a tree between any type of set of texts you feed into it; it is a faulty method. This particular method is faulty because it compares matching words between texts, and most texts have some words in common, so if only those common words are compared, any group of texts will have at least some of those words in common and they will look like a connected (through those words) tree. In contrast, my most-common-words test shows the patterns of the top-6 most common words in each text and most writers have preferred patterns that distinguish them from other writers. And it is generally irrational to place attribution results on a tree, as the data should be distinguishing texts into similar groups or clusters or individual texts. If your diagram automatically connects all texts to each other, it is designed to make all texts seem indistinguishable in style from each other, or to cloud multi-byline matches with uncertainty through their similarity in some words to all other texts in the corpus.

1034faktorovich
Apr 27, 2022, 1:45 pm

>1027 Petroglyph: The link to which you connected regarding "citation needed" states that is used on Wikipedia to point to a lack of cited sources, which has also been utilized as slang for what you are describing by some users. New slang or insider-slang that is invented by a small group of users and is only explained to insiders is not something that you can seriously expect anybody outside of your made-up-language-group to know. Just as you have now explained what you guys mean by "citation needed"; as I have been researching different insider slang/ strange usages the Renaissance Workshop used, I have been figuring out what their expressions meant beyond their surface meaning. For example, they might give a definition in a rare book that then explains the repetitions of a phrase like "Go to--"

If any sentence you write becomes nonsensical when words within it are substituted with denotated synonyms; then, your sentence was nonsensical and you just did not understand what the words you were using meant. There are no connotative meanings that override the underlying denotations. If you say "citation", you can be thinking "b**l-s**t", but that's not what the term "citation" means; thus, if you make a sentence using the latter erroneous meaning as in, "Your statement is citation", it would still remain nonsensical if one substitutes what you said with a synonym as in, "Your statement is reference".

1035susanbooks
Modificato: Apr 27, 2022, 2:10 pm

>1034 faktorovich: How do you purport to study early modern English when your grasp of contemporary language is so weak? You make a big deal of people using "tl;dr", "citation needed" -- these are instances of contemporary English usage. Rather than argue with them, why not say, "Thanks, I hadn't heard that one" and move on. Your incomprehension at language's operations has nothing to do with being a native speaker or not. Instead, it points to a rigidity of mind that is astonishing, I think, to most of us here. To be a good scholar, teacher, conversationalist, requires flexibility and humility. You lack both & accuse us of being at fault for your ignorance. If no other post has doomed your intellectual reputation, 1034 has.

I want to be nice to you. You make it so hard.

1036Petroglyph
Apr 27, 2022, 3:09 pm

>1035 susanbooks:
Admirably put

1037Petroglyph
Apr 27, 2022, 3:13 pm

In the interest of testing a method for questions we already know the answer to, I propose today's Lunch Break Experiment (tm) (though technically it involved a Lunch hour and now a pre-dinner baking potatoes hour). If I run R:Stylo on a series of texts whose authorial attributions are known, more or less, does the software produce the expected results, and if not: why not? (In all honesty: I also wanted to generate some Bootstrap Consensus Trees, because they are neat.)

So in today's episode we're going to take a look at some of the texts from the New Testament. Specifically: from the Latin-language Vulgate.

Corpus

I downloaded this corpus containing most of the books of the New Testament; these were originally sourced from The Latin Library. To download the corpus yourself, you need to log in to Github, then press the green "Code" button and select "download zip".

Note: Several texts have been split into two or more files: 1 Corinthians cor i 1 and cor i 2, Revelation (ap 1 and ap 2).

Individual dendrograms

Before I get to the Bootstrap Consensus Trees, I kind of have to explain what they are and under which conditions they are useful.

To begin with, let's generate a series of normal dendrograms (tree graphs, cluster graphs) for the same corpus but at different Most Frequent Words -- or in this case, Most Frequent Character groups of 3 characers.

The settings in Stylo: In Stylo, I set the language to Latin, and on the Features tab I selected to use characters instead of words, in groups of 3. I then set the minimum MFC to 100, the maximum to 1000, and the increment to 300. This means that Stylo will start generating dendrograms using the 100 Most Frequent Character trigrams, and in steps of 300, stop at the 1000 most frequent trigrams.

In other words, four dendrograms were generated, at 100 MFC, 400 MFC, 700 MFC and 1000 MFC. I've put them below. Right now, I'm not going to comment on the individual groupings, but I want to say a few things about the general shape of the trees first. (Open in new tab for larger images)

You can see that the clusters of "leaves" of this tree (the individual texts) don't really change all that much: the synoptics cluster together; the gospel of John is separate from the synoptics, the Pauline letters cluster together; also constant is the initial split, which is between the more narrative texts (gospels, acts, revelation) and the advice-cum-philosophy epistles. But we'll get back to these in a bit.

What does change is the overall shape of the tree -- the way the branchings go from the basic split between narrative/non-narrative texts and the eventual text clusters.

This is due to the fact that 100 MFC (or 100 MFW) measures slightly different things than 400, 700 and 1000 MFC (or MFW). In general, the 100 most frequent words will contain a larger proportion of function words than the larger MFW groups, which will be mainly content words. From the other end, at 1000 MFW more "noise" will be included -- comparatively rare words may have an outsize effect on the calculations. Deciding on the right MFW involves balancing these two extremes. The probabilities change a little at various levels of magnification. But this introduces a subjectivity that may be undesirable.

Bootstrap Consensus Tree

So. In order to solidify some of that unwanted and fuzzy variability, Stylo offers a "Bootstrap Consensus Tree" -- a tree diagram that takes multiple snapshots at various MFW, and keeps only those branchings that are part of the majority of those snapshots. The image below is what that looks like. I generated this image by setting the minimum MFC to 100, the maximum MFC to 1000, and the step to 50. This means that I told Stylo to generate a total of 19 trees, at 100 MFC, 150, 200, 250, ... 1000 MFC. On the Statistics tab, I selected Consensus Tree, and set the consensus strength to 0.7. This means that only those branchings were kept that featured in 70% of the trees. Anything below that is excluded from the final dendrogram. (Open in new tab to embiggen.)

(Important side note: in this type of graph distance does not correlate with similarity/difference.)

Results

Right. Let's see how well the text groupings produced by this method correspond with what we know about authorship in the New Testament. Overall, the tree produced here is entirely in the line of expectations and seems pretty damn accurate.

The three synoptic gospels (mr, mat, luc) form a single tight cluster -- which is entirely to be expected, given how much material they share, sometimes verbatim. The gospel of John (io 1-3 in orange) is very different, and forms a separate branch some distance away from the synoptics.
Traditionally, the "John" who wrote the book of Revelation (ap 1 and ap 2) was identified with "John the apostle", who supposedly wrote the gospel of John. Modern scholarship no longer takes that view. And indeed: the consensus tree shows a very clear separation between the gospel (io 1-3 in yellow) and Revelation (ap 1-2 in green).
Luke-Acts, despite layers of revisions, is likely by the same author. This graph, however, separates Acts (act 1-3 in red) from Luke, though the former is closest the synoptics cluster. I suspect that the reason Luke and Acts aren't on the same branch may be because of all the material Luke shares with Mark and Matthew. But I'm not an NT scholar, so I'll refrain from making claims I cannot possibly back up.
The epistle of james (iac, in black) is on its own branch, indicating a separate author from all the others.
The cluster in the top left contains Romans and first and second Corinthians -- three letters that are pretty universally seen as from the hand of Paul.
Finally: notice how this graph clearly stretches between two higher-level clusters: the more narrative texts at the bottom (gospels, acts, revelation) versus the epistles at the top.

Conclusion

So there you have it: Nearly all of this fits exactly with what we already know to be the case. Conclusion: This method can provide reliable results that we may apply to questions where the answers aren't as well studied.

Just one more thing

Just for shits and giggles I decided to perform one final test.

Because the original corpus did not include any letters that are almost-universally regarded as not by Paul, I decided to download those myself from The Latin Library (same source as the original corpus). I grabbed First Timothy, Second Timothy, and Titus.

Here is that consensus tree (same settings as last time) with those three letters added (clearly marked as not part of the original corpus):

Neat! The software places the three pastoral letters together with the other epistles, close to Paul but not on the same branch, suggesting their style may be similar to / inspired by Paul but not quite by Paul. It looks like James is closer to Paul's letters, at least in terms of MFC trigrams. I'm not going to comment on wether or not those three pastoral letters are by the same author -- I'm not an NT scholar, after all, and I don't really have a dog in this race, either.

1038Petroglyph
Modificato: Apr 27, 2022, 5:04 pm

Also, while I was dicking around with Dickens -- well, with a corpus of 16 novels by Charles Dickens -- I noticed this:

(The Stylo settings are consensus tree, most frequent character 4-grams. Generated from all the trees from 50 MFC through 5000 MFC, in steps of 50 -- that means 100 intermediate trees. Only branches that occur in 70% of all individual dendrograms are kept.)

This tree has a structure to it:

At the top left, there's works published in the 1830s.
Next, some works from the early 1840s, where A tale of two cities (~~1959~~1859) is the outlier.
Next is a group consisting of Expectations (1861) and Copperfield (1850).
Next is Chuzzlewit (1844) and Dombey (1848).
Next: Hard times (1854) and Little Dorrit (1857).
And finally, there's Bleak house (1853), Our mutual friend (1865), and the unfinished Drood (1870).

You guys. You guys! This tree shows how Dickens' style changed during his lifetime! There's an undeniable trend from early works through late works! A few works buck this trend, but you can definitely see an early-Dickens cluster and a late-Dickens cluster, with mid-Dickens scattered inbetween!

Edit: a date

1039Keeline
Apr 27, 2022, 4:48 pm

>1038 Petroglyph:

1959 => 1859 perhaps?

Very interesting though, both of them.

James

1040Petroglyph
Apr 27, 2022, 5:04 pm

>1039 Keeline:
Good catch, thanks. I've updated the original post

1041faktorovich
Apr 27, 2022, 9:01 pm

>1035 susanbooks: I finally figured out what has been going on in this discussion. You guys have been attempting to find a path towards being "nice" to me. And it has been "hard" for you because you keep expecting me to just have "flexibility and humility" in agreeing with everything you say, and not pointing out your errors. And the complexity of my arguments is making it difficult for you to explain how your constant insults are really your attempts to begin to be "nice" (later, or eventually).

1042faktorovich
Apr 27, 2022, 9:35 pm

>1037 Petroglyph: The only thing you are proving is that you manipulated the results to match the little that you learned about the authorship of these fragments from the Bible. You admit that you have not done significant research to understand the reasons for the current attributions. And you fail to explain the entire complexity of the interconnections in these trees you are drawing. You do not give any raw data, or even general data. Your strategy is that readers will believe that you are stating truthful things about the data, and that you have not manipulated it with the single goal of deriving the conclusion that fits with what some scholars have previously claimed about their attributions. Your goal is clearly to be "right" in your attributions, and not learning the truth about these fragments' authorship. If you step back from what you are saying and think about it, you might notice how absurd it is. The Vulgate is generally believed to be a 4th century Latin translation of the Bible from other languages. Thus, its linguistic style across all of its sections should reflect the style of its (probably) single translator, and not the original styles in other languages of the apostles like John and Paul, who you claim your analysis have confirmed wrote fragments of the text you tested. It could only be more absurd if your conclusion to testing some fragments was that Jesus or Moses wrote them, and that you had proven this with computational-linguistics. In contrast, I tested 284 different texts from the lifespan on the writers I have identified as the underlying Renaissance writers. I have not only performed not only 27 different quantitative tests, but also an overwhelming volume of research to explain my attributions through evidence that includes confessional self-attributed letters of the ghostwriters. I tested "Bancroft" and various other theological bylines to make sure none of these other bylines could have been the actual translators behind the English King James Bible. I am now working on translating Verstegan's "Restitution", and finding overwhelming proof that confirms he had to have been the main Bible translator as well. I already tested the James Bible in parts and have explained what these signify in Volumes 1-2 of BRRAM. Testing Latin or all other earlier variants would be an entirely different project that would have to consider the various bylines that worked in those earlier centuries. There is far more to be learned for me and my readers in focusing on dissecting "Restitution", and its many firsts (ground-breaking firsts that could only have been achieved by the top researcher of the Renaissance, and not a minor Catholic publisher, as Verstegan is claimed to be by most who have mentioned him in previous scholarship).

1043faktorovich
Apr 27, 2022, 9:39 pm

>1038 Petroglyph: Again, this can just be your exercise in drawing drees, and you might not have actually tested any of these texts, as you are not giving the raw data of which shared/ divergent words led to this oddly precise conclusion. I did not find any such clear-cut changes in an author's style with age, as instead I have found that style remains consistent for a professional author across a career. If there is a chronological change, it might mean that Dickens co-wrote later or earlier in his career, and this hypothesis can only be tested by including other bylines that Dickens published as a publisher in this corpus.

1044prosfilaes
Apr 27, 2022, 10:10 pm

>1042 faktorovich: The only thing you are proving is that you manipulated the results to match the little that you learned about the authorship of these fragments from the Bible.

Life's easier if you assume that everyone else just manipulated their results. You can prove pretty much anything by assuming other people are lying.

Your strategy is that readers will believe that you are stating truthful things about the data, and that you have not manipulated it with the single goal of deriving the conclusion that fits with what some scholars have previously claimed about their attributions.

It worked. In part because I know that I could do the exact same thing given an hour or so. Also, that whole pseudonym thing works in their favor; I see no reason to think that anyone outside this thread will ever know of those results, no greater glory will accrue to them as a result of this work, so why rig the tests?

1045faktorovich
Apr 28, 2022, 1:12 am

>1044 prosfilaes: When a researcher starts claiming they have attributed texts to the Apostles of God, the only rational conclusion is that results have been manipulated.

I absolutely do not think it is possible for everybody to be lying, as at some point most people must be saying some things that are true.

"It" has clearly worked for all previous computational-linguists in the Renaissance field, as exemplified by "The New Oxford Shakespeare: Authorship Companion": https://global.oup.com/academic/product/the-new-oxford-shakespeare-authorship-co...; I have done close reviews of the research presented in this and other similar studies and have explained how they have manipulated results and failed to present the raw data for these manipulations to be even more obvious, based on how they themselves describe their procedure (including researching too few texts/ bylines). While those presenting tests in this thread would not make any money from running tests on insignificant texts; the computational-linguists are obviously paid a lot of money by Oxford, other publishers, and other entities to reach the attribution conclusions that an entity wants, and not to arrive at the true attribution. Proving that all previous "Shakespeare" scholars have been wrong about even the existence of a man called "Shakespeare" is not something that any self-interested "Shakespeare" scholar/ editor/ publisher would want to sponsor. It is mesmerizing how I have presented overwhelming evidence to support my claims, and everybody in this thread is more likely to be mesmerized into believing the voice of God has been authenticated than just looking at my data to see for themselves it is genuine.

1046Petroglyph
Apr 28, 2022, 2:11 am

I wish changing the font were possible in Talk, because this is a case for Comic Sans if I ever saw one. Ah well. Failing that:

>1042 faktorovich:

tHe oNlY ThInG YoU ArE PrOvInG Is tHaT YoU MaNiPuLaTeD ThE ReSuLtS

ThE LiTtLe tHaT YoU LeArNeD AbOuT ThE AuThOrShIp oF ThEsE FrAgMeNtS FrOm tHe bIbLe

Hey, look at the time! We're overdue another reminder that Faktorovich doesn't know a thing about my degrees (or whether I have any), my educational history (or lack thereof), my professional history (or lack thereof), or even my L1.

yOu dO NoT GiVe aNy rAw dAtA, oR EvEn gEnErAl dAtA.

The same lies you were spinning in >296 faktorovich: and >311 faktorovich: in the Baum-vs-Thompson-of-Oz era of this thread. Post #1037 links you to the corpus, and the necessary Stylo settings are behind the spoiler tags.

But as long as you keep throwing out these lies, you'll have a fig-leaf reason not to step up and actually show your work to someone who knows what they're talking about, and who knows what you're trying to talk about. Keep hiding in your comfort zone.

YₒuR sTRATEGY is ThAT REAdERs wiLL bELiEVE ThAT Yₒu ARE sTATiNG TRuThfuL ThiNGs AbₒuT ThE dATA, ANd ThAT Yₒu hAVE NₒT mANiPuLATEd iT wiTh ThE siNGLE GₒAL ₒf dERiViNG ThE CₒNCLusiₒN ThAT fiTs wiTh whAT sₒmE sChₒLARs hAVE PREViₒusLY CLAimEd AbₒuT ThEiR ATTRibuTiₒNs. YₒuR GₒAL is CLEARLY Tₒ bE "RiGhT" iN YₒuR ATTRibuTiₒNs, ANd NₒT LEARNiNG ThE TRuTh AbₒuT ThEsE fRAGmENTs' AuThₒRshiP

There's always a deeper conspiracy.

tHuS, iTs lInGuIsTiC StYlE AcRoSs aLl oF ItS SeCtIoNs sHoUlD ReFlEcT ThE StYlE Of iTs (PrObAbLy) SiNgLe tRaNsLaToR, aNd nOt tHe oRiGiNaL StYlEs iN OtHeR LaNgUaGeS

... Assuming that all of the individual texts are written in equally proficient and literary Greek, and assuming just so many other things I don't want to go into right now.

aPoStLeS LiKe jOhN AnD PaUl, WhO YoU ClAiM YoUr aNaLySiS HaVe cOnFiRmEd wRoTe fRaGmEnTs oF ThE TeXt yOu tEsTeD

I've never claimed anything of the sort. Either your reading comprehension is severely lacking, or you could be doing this on purpose -- to accompany your shit-smearing lies.

iN CoNtRaSt, I TeStEd 284 dIfFeReNt tExTs fRoM ThE LiFeSpAn oN ThE WrItErS I HaVe iDeNtIfIeD As tHe uNdErLyInG ReNaIsSaNcE WrItErS. i hAvE NoT OnLy pErFoRmEd nOt oNlY 27 dIfFeReNt qUaNtItAtIvE TeStS, bUt aLsO An oVeRwHeLmInG VoLuMe oF ReSeArCh tO ExPlAiN My aTtRiBuTiOn ^..^..^..^..^..

Yeah, yeah. Keep telling yourself that. That's how you convinced yourself, and perhaps if you keep repeating your bullshit it'll maybe only a matter of time before your cOnTrIbUtIoNs tO HiStOrY AnD ThE HiStOrY Of lItErAtUrE become widely recognized.

1047Petroglyph
Modificato: Apr 28, 2022, 2:22 am

prosfilaes, I hope you don't mind me butting in here, but she's kinda slagging me off. I'll be quick, though.

>1045 faktorovich: When a researcher starts claiming they have attributed texts to the Apostles of God

pRₒViNG ThAT ALL PREViₒus "ShAkEsPEARE" sChₒLARs hAVE bEEN wRₒNG AbₒuT EVEN ThE ExisTENCE ₒf A mAN CALLEd "ShAkEsPEARE" is NₒT sₒmEThiNG ThAT ANY sELf₋iNTEREsTEd "ShAkEsPEARE" sChₒLAR/ EdiTₒR/ PubLishER wₒuLd wANT Tₒ sPₒNsₒR. IT is mEsmERiziNG hₒw I hAVE PREsENTEd ₒVERwhELmiNG EVidENCE Tₒ suPPₒRT mY CLAims, ANd EVERYbₒdY iN This ThREAd is mₒRE LikELY Tₒ bE mEsmERizEd iNTₒ bELiEViNG ThE VₒiCE ₒf gₒd hAs bEEN AuThENTiCATEd ThAN JusT LₒₒkiNG AT mY dATA Tₒ sEE fₒR ThEmsELVEs iT is GENuiNE

If I may paraphrase from Titus Andronicus, act 4, scene 2 (by Edmund "Walter Raleigh" Spenser, son of William "Shakespeare" Drummond; their pronouns were thou, thee, thy, thine):

FAKTOROVICH:
thou hast not even looked at our data
PETROGLYPH:
Villain, I have undone thy data

1048anglemark
Apr 28, 2022, 7:36 am

>1047 Petroglyph:

(coffee-on-the-keyboard style rejoicing.)

Here is something I have been pondering since last night when I read >1037 Petroglyph: could a 3-character n-gram analysis be a stronger indicator of individual style in a highly inflected language with a less strict word order, such as Latin or Old English, compared to a language such as Present-Day English? I confess that my statistics skills are not as strong as I'd like them to be, and my intuition isn't worth more than the pixels on the screen, but it seems like it might be the case.

-Linnéa

1049Petroglyph
Apr 28, 2022, 11:19 am

>1048 anglemark: could a 3-character n-gram analysis be a stronger indicator of individual style in a highly inflected language with a less strict word order

Interesting question!

I'm not sure, actually. I think so? I'd have to look it up when I have time.

The "how to use stylo" pdf linked here mentions that certain distance measures are more appropriate for highly inflected languages (search for "inflec"), specifically, the ones that lend a little more weight to comparatively rare words -- the same lemma with different case / verb endings. (the authors also note that this only really works with a limited very small set of MFW.)

Spontaneously, I want to say that looking at character trigrams kinda steers around that problem, since you'd be taking snapshots throughout the stem and the ending, but I wouldn't trust my intuition in these matters, either. I'll have to look it up.

And quickly re: word order:

Unless you do word ngrams (higher than 1, obvs), word order wouldn't matter: the individual word frequencies are independent of surrounding words. Stylo includes a dataset containing the word frequencies of a set of copyrighted novels (JK Rowling, her pseudonym Robert Galbraith, Harlan Coben, CS Lewis and JRR Tolkien; another training dataset has novels from Harper Lee and Truman Capote). Not the actual novels, just a table with all the word frequencies. This is how a lot of copyrighted material is shared in data mining -- you can spread alphabetized lists of all the words in a book, or, like here, frequencies, just not the texts themselves.

(This removes some analysis types from your arsenal: word n-grams, and also sampling.)

But yes: I think for word n-grams the strictness of word order would matter, especially for a higher n. At that point, though, it'd be more informative to use a POS tagger and run an analysis on that level!

1050faktorovich
Apr 28, 2022, 12:34 pm

>1046 Petroglyph: When you contort the font of a quote to make it extremely difficult to read, you are explaining that your goal is to censor or to prevent readers from being able to or being allowed to comprehend its actual intention. Instead you are attempting to make your own falsehoods sound believable by comparison, as they are mostly in a readable font. Indeed you are deliberately attempting to make sure that all of the readers of this post feel as if their "reading comprehension is severely lacking" as they attempt the painful process of reading letters with erroneous capitalization.

I have "actually" shown my "work", or the full raw and processed data set(s) for each experiment I have described in this thread. You are the one who is not showing the data, but rather only showing visualizations of your conclusions. If you keep accusing me of what you are guilty of, you are just adding falsehoods on top of past falsehoods.

I have not made a single false statement in this thread, and I don't believe I have ever made a false statement in my life. I have fully documented and explained in the BRRAM series why my attributions are precisely correct. I have been answering all questions and comments in this discussion to help anybody who might have misunderstood my research to understand it better.

1051faktorovich
Apr 28, 2022, 12:41 pm

>1047 Petroglyph: There was a Simpsons episode about my "Shakespeare" re-attribution findings?

You have not "undone" (definition: reversed, cancelled, not done) my data, as you have not hacked into my GitHub to reverse or alter the results, neither have you blocked access or cancelled my GitHub, nor have you managed to travel back in time to stop me from having done the data.

1052faktorovich
Apr 28, 2022, 12:53 pm

>1048 anglemark: An example of a 3-character n-gram is "ana" or "par". Inflection is the change of words or their endings based on tense etc. Latin is on the more-inflected side of the scale, but English is on the other less-inflected side (that's why when Latin words are Anglicized, their endings tend to be deleted, and this pattern of Anglicization was similar in during the Renaissance; Old German/ Old English was somewhat more inflected than Early Modern English, but as you guys have said few examples of Old English survived). A highly inflected language might have more similar endings (-ing/-est), but this depends on the rules of the individual languages being compared and not generally on the degree of inflection (as some inflection does not change more than a single letter in a word). The top-6 most common characters capture these larger changes in the patterns of letter usage, without over-stressing a suffix change or other structural language variations.

1053faktorovich
Apr 28, 2022, 1:01 pm

>1049 Petroglyph: As I demonstrated when I tested Stylo, that system produces many glitches that include corrupted data outputs that garble up words in the text it processes (even in cases where these words are legible to all other data processing software). Thus, using a frequency list created by Stylo (without uploading the texts for one's self and running a frequency test on it with alternative software that does not generate glitches) is likely to introduce errors that would skew the outcome of the analysis. Similarly, simply plugging texts into a program somebody else built and spewing out the graphs it generates as your own research is a faulty research method that is designed to minimize research-labor, and to minimize accuracy-of-results.

1054susanbooks
Apr 28, 2022, 1:37 pm

>1041 faktorovich: Yes, being nice & respectful to others is how we manage to interact meaningfully. Being respectful of one's materials & other people's work, even as we challenge them, is how intelligible scholarship takes place. How is this a new concept for you? You have no more respect for your materials & previous scholars than you do for the people in this thread. People have been trying to engage with your ideas; you insult, obfuscate, and inundate with walls of text in response. That's not useful dialogue. That's you not understanding how to act in society.

I tell my students that when they write they always need to imagine a resisting reader, one who is as smart & sane as they are but just doesn't agree with a thing they say. Our job as writers is to respectfully & convincingly address that reader's questions. You just yell, insult, misunderstand then blame us for your incomprehension rather than asking for clarification. Again, I'm astonished you were able to function enough in academia to get a doctorate.

And the odd font in >1046 Petroglyph: isn't meant to make reading difficult. The font itself carries a meaning. Everything you don't understand you lash out against. Again, why not just ask why someone uses an unfamiliar word or way of communicating rather than immediately attaching your own wrongheaded definition & then ranting at length? You waste everyone's time tilting at windmills. Your ignorance is not the limit of every single conversation.

1055anglemark
Apr 28, 2022, 4:47 pm

>1049 Petroglyph: Hmm. Yes, I understand – I was thinking about how the case inflections might result in a less varied set of character 3-grams, which might be counterbalanced by the word order, but I'm probably thinking too much like a human :-) Thanks! (Just ftr, I did a quick comparison of the KJV Gospel of Luke from Gutenberg and of the Vulgate Evangelium Secundum Lucam, and found that a) the English text has 25939 tokens, 2387 types, and 186 unique 2-character word endings, b) the Latin text has 18062 tokens, 4534 types, and 135 unique 2-character word endings. I used the Wordlist generator at https://www.reuneker.nl/files/wordlist/ .)

Part-of-speech / word order analysis should definitely be more relevant.

1056faktorovich
Apr 28, 2022, 9:11 pm

>1054 susanbooks: Petroglyph just exploded in a tirade of slang and insults, and you are concerned with me not being "nice & respectful"? The double-standard is clearly because I am attempting to overturn a very old, very white, and very bourgeoisie field. There is nothing "intelligible" or scholarly about Petroglyph's miss-capitalized babbling in the last few posts. You have all been attempting to ignore everything I have been saying in the most disrespectful manner possible, as you instead interject your own false and unsubstantiated claims with the insistence that they are simply superior due to them being generally believed by the public, and thus deserving space to be re-repeated even if they reflect very old and very false ideas.

You have summarized the errors all of you are making in this discussion when you state: "You just yell, insult, misunderstand then blame us for your incomprehension rather than asking for clarification."

The use of erroneous capitalization is not an example of subtext that carries "meaning", but rather an example of incompetent linguistic usage magnified to make it seem to be an absurdist literary device. It is extremely pompous to endow such nonsensical capitalization errors with any philosophical hidden meaning. You clearly now prefer to digress into such abstractions over addressing any rational subjects because I have successfully countered every semi-rational point you guys have been making.

1057faktorovich
Apr 28, 2022, 9:19 pm

>1055 anglemark: Another way of phrasing what you have figure out is that the translation into English out of Latin (and other languages) was so heavy-handed that most of these quantitative measures shifted by as much as doubling; this is why Verstegan and Harvey's linguistic styles are clearly identifiable in KJV, as they made major adjustments in the phrasing for Early Modern English. And yes, the part-of-speech measure is very useful in spotting their preferences in any genre. But word-order-analysis is an impossibly difficult measure to calculate because there are so many possible orders and combinations across a corpus with 284 texts.

1058prosfilaes
Apr 28, 2022, 11:25 pm

>996 Petroglyph: Do let me know if you think these are unproductive additions to a thread that has mostly run its course, and I'll stop posting them here and just keep them to myself.

I do think they could go better in a thread and group more on-topic, and less, um, poisoned. I don't see one outside the very general Book Talk; would a Stylo forum be too narrow?

1059Keeline
Apr 29, 2022, 12:37 am

>1058 prosfilaes: Maybe something a bit broader like "Stylometrics" or "Literary Computing" ?

James

1060bnielsen
Apr 29, 2022, 2:16 am

>1059 Keeline: I'd like to follow the experiments so please leave a link here if you take the discussion to another group.
I'm just an interested amateur, but my interest goes all the way back to the first computer generated concordances.

For the weekend or similar:
If anyone needs a really good story from that time period you should get hold of "Ben Ross Schneider: Travels in computerland : or, Incompatabilities and interfaces " about a guy trying to digitize The London Stage.

1061anglemark
Modificato: Apr 29, 2022, 3:36 am

>1057 faktorovich: Well, nobody said anything about 284 texts – the discussion you commented on was about highly versus less highly inflected languages in general. However, "word order analysis" (a clumsy and vague term, for which I apologise) is absolutely not impossible, but probably less relevant when looking at authorial style in Modern English, where the word order is relatively more fixed than in Latin.

For instance, if you have a POS tagged corpus of Modern English prose texts and look at all collocations of noun + adjective(s), any variation you find will primarily be governed by rules of syntax. In a corpus of Latin or Old English, the variation will be governed by other factors.

-Linnéa

1062susanbooks
Modificato: Apr 29, 2022, 11:13 am

>1056 faktorovich: "The use of erroneous capitalization is not an example of subtext that carries "meaning""

Once again, you prove your inability to closely read your sources. The poster in question wrote that they would have preferred using Comic Sans but since LT didn't support it, opted for the font they used instead. Comic Sans has a meaning. A careful reader would assume that the substitute font would have a similar, somewhat synonymous meaning, as indeed it does, the disorganization of the type implying the disorganization of the thought of the person being quoted, i.e., you. I out & out gave you the answer -- that the font had meaning -- and you kept to your original, wrongheaded interpretation. Yet another damning illustration of your intellectual method.

1063faktorovich
Apr 29, 2022, 12:11 pm

>1061 anglemark: If you narrow down the general concept of "word order analysis" to use of the most common "noun + adjective" combinations, you are not really doing "word order analysis", but rather searching for common combinations of adjective-first-and-then-noun phrases. There are no word-order analysis strategies that I can imagine that would allow for systematic and consistent analysis that would not become infinitely complex when one is measuring several book-length texts. And yes, part of the problem is that most noun-adjective and other word orders are governed by syntax stating which of these should go first, versus individual preference of the authorial style.

1064faktorovich
Apr 29, 2022, 12:25 pm

>1062 susanbooks: Comic Sans (http://www.identifont.com/show?1MH) font is not that different from what seems like the Arial font used on LibraryThing. If LibraryThing allowed for font-alteration, they would probably have users applying unreadable or otherwise problematic fonts. The Comic Sans font has been controversial, as it has been subject to bans etc. for its overuse in educational content. The only meaning thus "Comic Sans" has is as a deflator of meaning or significance. It is not even an entertaining cartoonish fonts like some of those available for free on Google Fonts. There is nothing synonymous between a slightly handwriting-like font like Comic Sans and the use of miss-capitalization within words. It was indeed used to suggest "disorganization" in what I was writing, but the only thing that was disorganized was the miss-capitalization that made it difficult for all readers to be able to read words highly-organized and specific sentences I had written before this subversion disrupted them. You gave the wrong "answer" because your intention is to insult and belittle me. Thus, it was perfectly rational for me to contradict your error and to now restate my position with more specificity, so all readers will precisely understand my intended original meaning.

1065susanbooks
Modificato: Apr 29, 2022, 12:37 pm

>1064 faktorovich: "It was indeed used to suggest "disorganization" in what I was writing, but the only thing that was disorganized was the miss-capitalization that made it difficult for all readers to be able to read words highly-organized and specific sentences I had written before this subversion disrupted them"

Do your own words even make sense to yourself? Subversion has meaning, yes? Figure it out from there. It's useless to argue with someone who has no grasp whatsoever, even the most elementary, of semiotics. It's great that you've managed to find a niche despite your obvious mis-abilities. Good luck. (And stop zapping bugs -- the climate disaster is bad enough without your help.)

1066Petroglyph
Apr 29, 2022, 1:31 pm

>1058 prosfilaes:
>1059 Keeline:
>1060 bnielsen:

Honestly? Not sure if a different thread would be all that worthwhile. I may have another three or four Lunch Break Experiments (tm) that I'm tinkering with, which is, perhaps, not enough for a thread. But that depends on others' contributions, obviously.

1067Petroglyph
Apr 29, 2022, 1:52 pm

>1057 faktorovich: bUt Word-ORDeR-AnAlYsis Is aN iMpoSSIbly DiFfICulT MEasURe to CALcULatE bECauSe THerE aRe so ManY poSsIBlE ORderS aND ComBInatIOnS

The Brown Corpus, compiled in the sixties, was once the gold standard for corpus linguistics -- it had one million words, you guys! Ground-breaking, pioneering, very important for corpus linguistics and computational linguistics. Can't praise it enough.

Anyway: that corpus acquired POS tagging over the seventies and the eighties, which is where a number of early non-human, machine taggers were trained. That's forty, fifty years ago. The technology has only gotten better. TreeTagger is one obvious example.

And since I am familiar with Faktorovich's plodding, unaided, manual, copy-pasting, environment-switching, one-text-at-a-time, mathless, largely statisticsless method, I can already hear the objections: "that wouldn't work on a renaissance corpus" (false -- Shakespeare plays and plenty of others have been POS-tagged). Or she'd assume that she'd have to perform manual POS-tagging on the entirety of her corpus, instead of, you know, just doing five thousand words here and there and use those as trials and samples. It's just throwing pearls in the water at this point (I almost wrote "pearls before swine", but that would probably earn me another accusation of antisemitism.)

1068Petroglyph
Apr 29, 2022, 2:02 pm

>1055 anglemark:

So I had a quick look at this book:

Savoy, Jacques. 2020. Machine Learning Methods for Stylometry: Authorship Attribution and Author Profiling. Cham: Springer. https://doi.org/10.1007/978-3-030-53360-1.

Savoy doesn't go into detail re: highly inflected languages, but he recommends they are POS-tagged with TreeTagger: depending on the language, you can download a parameter file, which in addition to tokenization (assigning word forms their token -- Noun, AuxVerb etc), also assigns inflected words to their lemma. Check out the documentation for Finnish or Latin. You can then run your analyses on the lemmas.

In Savoy's chapter 8, which is a case study of Elena Ferrante, some of the analyses have been performed on lemmatized versions of the Italian-language corpus.

1069rosalita
Modificato: Apr 29, 2022, 6:00 pm

OK, I'm convinced. There's no way after reading the last two days of message, there is no way there isn't a serious troll at work here, whether consciously or not. But that's OK; I just skip those messages and read the replies for my daily entertainment. :-)

1070Keeline
Apr 29, 2022, 2:31 pm

Although the HTML 4 <font> tag and the style property of HTML tags is removed from the talk editor which prevents inline font selection, there are till some ways to style text beyond the usual bold, italic, ~~strike~~, and underline options.

For example, when writing in Facebook groups about the 𝔇𝔦𝔰𝔫𝔢𝔶𝔩𝔞𝔫𝔡 𝔑𝔢𝔴𝔰 newspaper, I will sometimes style it like this. It is not a precise match but it is just a little different.

Facebook groups also don't allow for even italic, bold, etc. so I have to employ other methods to get that if it is desired.

I don't use fonts or emojis or animated GIFs to mock people with whom I am having a conversation. I've certainly had them used towards me and since I don't appreciate it, I don't return in kind. I will use styling to indicate a clearer sense such as indicating a book title without incorrectly using double quotes.

𝚂𝚘𝚖𝚎𝚝𝚒𝚖𝚎𝚜 𝚑𝚊𝚟𝚒𝚗𝚐 𝚖𝚘𝚗𝚘𝚜𝚙𝚊𝚌𝚎𝚍 𝚝𝚎𝚡𝚝 𝚑𝚎𝚕𝚙𝚜 𝚝𝚑𝚎𝚛𝚎.

Here on LT Talk I can use the <pre> tags.

Back in 1984-85 when the Macintosh computer was introduced and users suddenly had 8 or so fonts, they were abused for a while, especially the silly ones like San Francisco that looked like a ransom note. When Windows became common for the PC clone audience, the options for font selection beyond the traditional spread to that larger group of users.

Per Wikipedia (usual caveats apply), Comic Sans was introduced by Microsoft in 1995 and was part of Windows 95. I don't see it mentioned there but I thought that it was a default font for some Microsoft applications. Probably this is a false memory. Maybe many users just liked the novelty of something that looked like a handwritten non-cursive font. But it was one from a limited list until people figured out how to install other fonts so it was overused. Sometimes it was used in serious situations but because of its name and appearance, it is more appropriate for more casual situations. There are memes which show how the selection of a font or typeface changes the meaning of the same set of words.

I make no criticism of someone who wants to use it to emphasize their point.

But if I want to have 𝒻𝓊𝓃 𝓌𝒾𝓉𝒽 𝓉ℯ𝓍𝓉 in LT Talk there are 🅢🅞🅜🅔 🅦🅐🅨🅢 🅣🅞 🅓🅞 🅘🅣.

Note that these techniques may not be as searchable or as compatible with systems for the disabled (any more than Microsoft's incorrect marking of typographers apostrophes, quotes, and em-dashes is).

James

1071Petroglyph
Apr 29, 2022, 2:58 pm

>1056 faktorovich: The USE oF ERrOneOus CApItAlIzATIoN iS NOt An EXaMPle oF suBTExt ThAT carRIes "MEanINg", But rATHeR An exAMPle oF iNCOmPETeNT LiNGuiStIC UsaG

It's known as Spongebob capitalization. Here's an article; here's the KYM page, and here is a google images search.

Once again, you have demonstrated that you do not make a systematic difference between "Faktorovich does not understand this" and "this is nonsensical to other people besides me".

>1054 susanbooks: and >1062 susanbooks: and >1065 susanbooks: were kind enough to give you pointers -- Usually, giving people a hint and letting them figure things out themselves is preferable to straight-up telling them what to correct and how to correct it. susanbooks was being nice to you by treating you like an adult who is capable of learning and willing to at least try and find things out on their own.

Incidentally, if you ever want to generate this style of capitalization yourself (because I know if you did you'd just sit there and type these things out letter by letter), here's how to do it in R:

# First, install the necessary package
install.packages("spongebob")
# Then load the package into R
library(spongebob)
# use the tospongebob function and wrap the text to be converted between quotes
tospongebob("Message you want to convert to spongebob case")

(the hashtagged lines are comments that tell you what each line does; they are ignored when running the code)

Or, and this would be more in line with your workflow: google "spongebob meme text generator" or "spongebob capitalization generator".

>1056 faktorovich: TheRE IS NOThinG "IntelLIGiblE" Or sChoLARly ABOuT PEtrogLYph's mIsS-cApITaLiZeD BAbblinG iN The LAST fEw poSTs

You not understanding memes does not make them unintelligible.

Also, "nothing scholarly about Petroglyph's miss-capitalized babbling"?? Faktorovich, we're on an internet forum. People are allowed to post memes and say things like "fucking bullshit" here. You're making a category error in thinking we're on a scholarly forum.

Or perhaps this is another example of you focusing on the easily-mimicable external trappings of science and scientific discourse, and thinking that having them (or not) automatically confers the necessary cachet (or removes it). You do that a lot. Maybe you should pay more attention to the actual practices and approaches that proper scholars would display.

>1064 faktorovich: tHErE Is NOtHiNG syNONyMouS BEtWEeN a SLigHtLY haNDwrItinG-LIkE fONT liKe COMiC SaNS And THe uSE oF miSs-cApItaLIZaTion WitHin WOrDs

It is not "mis-capitalization" when it's been explained to you a few times that this particular style of capitalization is used to suggest disorganization of thought.
Comic Sans and Spongebob capitalization are synonymous in the sense that both are used to mock someone else's deeply misguided thoughts.

(In that self-same post, you can accurately say that "It was indeed used to suggest "disorganization" in what I was writing", only then you contradict yourself by calling it "miss-capitalization" (sic). In other words, it's a mistake when you want to throw shade on it, but you can accurately say what it's used for when you want to sound like you know what you're talking about. The kind of persona you're trying to project changes from sentence to sentence.)

1072Keeline
Apr 29, 2022, 3:05 pm

>1068 Petroglyph:

The Savoy book intrigues me a bit but I think the price will keep me from getting a copy for my level of interest. There are other ways I can (and should) spend US$130. Looking at the Amazon listing for it

https://www.amazon.com/dp/303053359X/

I am a little dubious of their claim that the techniques surveyed can be helpful for "detecting fake news" since that has less to do with identifying an author as it does whether information is presented in a factual manner. Perhaps they have a special case they are presenting and it is designed to detect one author or a piece of software that is generating content with certain traits.

I've added it to my Amazon wish list and I'll watch the price in case this goes down or is presented as a used copy at a lower price I can justify.

It seems unlikely to be available in the libraries to which I have access. The nearest major university to me is nearly an hour a way and I don't have borrowing privileges with them. If it comes down to inter-library loan, that won't happen with them. It's one of the reasons I buy books I want to read rather than trust that a library will have or retain a book I might like.

There have been some abbreviations used here that I am trying to pick up on by context. For example, I know that what I might find in the Urban Dictionary for "POS" is not what is used here but probably "parts of speech." Is there a good resource for the TLAs in this field? There, I did it. TLA = "three letter acronym" :)

I am interested enough in the Lunch Break™ experiments that I have been gathering the software to make tests on texts for which I know a lot about the authorship from extrinsic evidence (contracts, letters, etc.). It would be interesting to see what they have to say. As I mentioned, long ago I implemented another system and it was pretty good at discerning the authors for very small samples in the 100-sentence range (i.e. chapter length for my kind of books). This sort of thing is a side interest. But sometimes there are situations describe in the letters where an assigned ghostwriter could have subcontracted one of the stories and I'd like to get a second opinion from some software to go with my own reading which is colored by bias of being aware of the letters.

As a kid I discovered some of the original Tom Swift books on the family bookshelf. I started with the two we had in dust jacket, Tom Swift and His Photo Telepone (1914) and Tom Swift and His Wizard Camera (1912). My copies were from the mid-1920s. I liked them and the four or five books without jackets we had. We also had a copy of Don Sturdy on the Ocean Bottom (1931) which was also by "Victor Appleton" and claimed to be "By the Author of Tom Swift" stamped on the spine of the book. However, when I read it as a kid, it did not seem at all like the voice writing the Tom Swift books. In the late 1990s I saw the business records for the Stratemeyer Syndicate and found that 14 of the 15 Don Sturdy books were not by the principal ghostwriter of Tom Swift but were by another one of their prolific writers. I felt good that my youthful instinct seemed on target but the intellectual side of me knows that it can be misleading. Later I read The White Ribbon Boys of Chester (1916) and was convinced that I knew the ghostwriter. The letters showed that it was another.

Readers of juvenile series books firmly believe they can pick out certain ghostwriters or cases where an idea for a plot was used by one author and then another. They usually assume that one author "stole" from another. Yet, there are documented cases where one newspaper clipping inspired multiple writers who wrote similar stories concurrently and turned them in. I am thinking of the fishing with dynamite plot here. But on the other hand, some of the cases where story elements are copied occur when an author (or publisher who owned the story) reused old material.

This sort of thing interests me which is why I sift through this thread for the occasional gems.

James

1073Petroglyph
Apr 29, 2022, 3:16 pm

>1043 faktorovich: I did not find any such clear-cut changes in an author's style with age, as instead I have found that style remains consistent for a professional author across a career.

That does not mean it's impossible, though. Have you read this paper?

Forsyth, Richard S. "Stylochronometry with Substrings, Or: A Poet Young and Old." Literary and Linguistic Computing 14, no. 4 (1999): 467-478

It gets cited in a few handbooks of stylometry and translation studies, so I suppose these findings are known in those fields. What do you make of Forsyth's points?

>1042 faktorovich: The Vulgate is generally believed to be a 4th century Latin translation of the Bible from other languages. Thus, its linguistic style across all of its sections should reflect the style of its (probably) single translator, and not the original styles in other languages

Rybicki et al. (2013) argue that, under certain circumstances, the hand of the translator can be seen in their translations.

J. Rybicki, M. Heydel, The stylistics and stylometry of collaborative translations: Woolf’s Night and day in Polish. Lit. Linguis. Comput. 28(4), 708–717 (2013)

Do you think the conditions they consider apply to the translator of the Vulgate? Would the Bible being regarded as sacred change the conditions under which that is possible? I'm thinking of Wulfila's translation into Gothic, which closely follows Greek word order (which makes for some weirdly-formulated Gothic, I can tell you) -- except in those cases where the Greek order would make the Gothic ungrammatical.

1074Petroglyph
Modificato: Apr 29, 2022, 4:17 pm

>1072 Keeline:

Yeah, Academic publishing houses are atrocious.

FWIW, I searched "fake news" in the Savoy book, and it only crops up in analyses of Twitter corpora, and in particular with reference to this book:

K. Shu, H. Liu. 2019. Detecting Fake News on Social Networks. San Francisco: Morgan & Claypool

So if you're interested in that, I'd put this book on your wishlist instead.

Re: POS, yeah: that's Parts Of Speech. Sorry, forgot my audience there for a sec.

Just because I have it here in front of me: the Savoy book has a list of acronyms. Download the "front matter" pdf from the publisher page.

I have been gathering the software to make tests on texts for which I know a lot about the authorship from extrinsic evidence

You're on a Mac, right? Have you tried JGAAP? It was developed for author attribution by Patrick Juola, the scholar who detected with this particular stylometry tool that a Galbraith book had most likely been written by JK Rowling. Here is a blog post (which I warmly recommend!) in which he explains his method.

(disclaimer: I haven't tried JGAAP myself)

If you need, say, a list of function words in English to run your tests on, let me know -- I've got one handy.

the fishing with dynamite plot

I can just smell the pages of the adventure books that would have that plot. Ah, good times.

1075Petroglyph
Apr 29, 2022, 4:03 pm

>1069 rosalita:
I know, right? But 14 books!

1076anglemark
Apr 29, 2022, 5:02 pm

I (Johan here) have long sínce stopped reading Faktorovich's tedious posts. Just reading the replies is getting all the fun with a minumum of the incredulous pain.

1077susanbooks
Apr 29, 2022, 5:03 pm

>1069 rosalita: if so, this is one of the very, very best. I mean, how can you possibly argue with EVERYTHING?

1078norabelle414
Apr 29, 2022, 5:22 pm

>1058 prosfilaes:
>1059 Keeline:
>1060 bnielsen:
>1066 Petroglyph:

This group might be a good match: https://www.librarything.com/ngroups/202/I-Survived-the-Great-Vowel-Shift

1079Keeline
Apr 29, 2022, 5:54 pm

>1074 Petroglyph:

This is what I had in mind. I had to look up the reference to it.

The next question is the source of plots. For the purpose of discussion, these have been separated into ten general divisions, as follows:

1. Newspaper accounts. Practically every paper contains some account out of the ordinary. Here are two examples that were utilized.... Again, some years ago, in the great Northwest, a man threw a stick of dynamite, with the fuse lighted, into a body of water to kill fish. His dog, not understanding the motive of the act, swam out, caught the sputtering explosive in his mouth and started gleefully back toward his master, who promptly fled..... Jack London sold a story based on the second idea—and so did two other authors. All were accused of plagiarism, although an investigation proved conclusively that none had borrowed from the other. This source, therefore, while it furnishes the basis or nucleus of a great number of stories, must be used discriminatively. Stories full-blown should be rejected promptly; suggestions—beginnings, hints, conclusions, characters, situations, and the like—should be accepted and the story developed about them.

— Quirk, Leslie W. "A Course in Short Story Writing. No. 3. The Plot." The Editor, Mar. 1908, p. 117-118.

Leslie W. Quirk was the editor of The Editor which was a trade magazine for writers and other literary workers. He was also an author of some series books. Jack London needs no introduction.

I did a bit of sleuthing for the clipping that London and the others probably saw.

This is the Wikipedia page for the Jack London story, "Moon-Face" with a summary and some mention of the multiple stories that used this idea:

In July 1901 {actually it was 1902}, two pieces of fiction appeared within the same month: London's "Moon-Face", in the San Francisco Argonaut, and Frank Norris' "The Passing of Cock-eye Blacklock", in Century Magazine. Newspapers showed the similarities between the stories, which London said were "quite different in manner of treatment, but patently the same in foundation and motive." London explained both writers based their stories on the same newspaper account.

The Jack London story, "Moon-Face" was published in The Argonaut of San Francisco, 21 July 1902. "The Passing of Cock-eye Blacklock" was published in Century magazine, July 1902.

One story that likely was seen by both authors was called:

A FISH TALE'S MORAL
STORY OF A MAN WHO WENT DYNAMITING FOR TROUT.
_____
His Faithful Dog Retrieved the Charge and Burning Fuse and Tried to Bring It to His Owner—Latter Fled Wildly—Dog Dead

— (Spokane, WA) Spokesman-Review, 9 May 1901, p10.

Such a story was probably published in multiple papers at this time. This nine paragraph account is set in the Kootenays, British Columbia, and tells of a man named "Manager Cronin" from a particular mine who was hungry and didn't want to wait for the fish to bite.

But this kind of story, like crafted jokes, may not even be real. I found an earlier example with the same premise but different details:

HE RETRIEVED.
The Dog Brought the Dynamite Back.
An Indiana Farmer's Queer Fishing Experiment
That Nearly Cost Him His Life and Did Cost a Dog

— Oakland Tribune, 8 Aug 1894, p. 3.

This story is datelined Muncie, Indiana and refers to a "Farmer Sunderland" in a similar account but with just two paragraphs.

It was (and probably still is) exceptionally common for authors to keep clippings of nuggets of stories like this that they might develop. Of course, when one is too good and several authors of prominence all use it, it tends to be noticed.

This isn't like the case where someone takes a poem or story or full novel and copies it word for word, perhaps changing only names and locales, and then presents it as their own work. Such things did occur, especially when they figured few would notice across the Atlantic. I've written about a few such examples of this.

James

1080rosalita
Apr 29, 2022, 6:03 pm

>1075 Petroglyph: Given the amount of time devoted to this thread, it's hard to imagine there will ever be a 15th!

>1076 anglemark: That's just the right attitude to take, I think. Especially if, like me, you are not equipped to attempt any sort of fact-checking or error correction.

>1077 susanbooks: Good point! I salute the commitment to the bit, at the very least.

1081Petroglyph
Apr 29, 2022, 7:13 pm

>1076 anglemark:

You're not missing much -- there's a lot of repetition in her posts.

I'm having fun trying out random queries on a corpus in a way that my previous research in corpus linguistics would not have covered, reading about some computational approaches to text mining and authorship attribution, and maybe even some graphing software. I've encountered some fun papers on Shakespeare, too -- I might post a summary of one of those, if I feel so inclined. So I'm approaching this as a neat little learning opportunity that's a welcome distraction from two years of Covid and an impending war.

Do you think I could put this thread on my CV under the heading "Popularisation of Science"? My university encourages that kind of outreach...

1082Petroglyph
Modificato: Apr 29, 2022, 7:27 pm

>1079 Keeline:

Well, I must say I was not imagining a dead dog. That makes it a sadder story than it should be. It's a very Why Women Live Longer kind of story, though.

1083faktorovich
Apr 29, 2022, 8:59 pm

>1065 susanbooks: I explain the concept of "subversion" in my "Rebellion as Genre" book; it is the process of indirectly undermining something with underhanded tactics; in contrast with direct opposition, it is a sneaky form of sabotage.

1084Petroglyph
Apr 29, 2022, 9:19 pm

>1057 faktorovich: the translation into English out of Latin (and other languages) was so heavy-handed that most of these quantitative measures shifted by as much as doubling

Do I really need to break out the Spongebob capitals again? What you have figured out is that text expansion is a thing in translation.

Let's compare the Vulgate (.txt file from here), and the KJV and the Douay-Rheims from PG, with all the non-biblical stuff removed. (I had to clean the Vulgate text a little: I expanded the ligatures (e.g. æ and Æ to ae and AE); I also removed the abbreviated book title that preceded each line because with all those Gen, Exo, Lev, ... left in, the total number of "words" was over 647,816.)

Using the same online service as in >1055 anglemark: we get this:

Vulgate:      612,005

Douay-Rheims: 988,956 (=376,951 words more; the original expanded by ~62%)

King James:   790,031 (=178,026 words more; the original expanded by ~29%)

These are entirely unsuprising numbers: Latin, as a highly inflectional language, is naturally more compact, in terms of words. It uses nominal case endings and a considerable array of verb tenses where English uses prepositions and multiple auxiliaries; fourth-century Latin does not have articles (the, a(n)), which are some of the absolute most frequent words in English -- the can account for ~8% of the words in a regular English text. And indeed: in the KJV, 8.08% of the text is taken up by the word the .

That means that, of the ~29% of textual expansion between the Vulgate and the KJV, the word the, over a quarter of that (8.08 / 29 = 27%) consists of the just the word the.

Let's take a look at the first clause in Rev 22:19 (because it happened to be on screen, and because it's simple enough for my rusty Latin to decipher):

Vulgate (9 words)
Et  si quis          diminuerit      de   verbis      libri              prophetiae            hujus
and if any.masc.sing remove.fut.perf from word.abl.pl book.gen.neut.sing prophecy.gen.fem.sing this.gen.fem
"and if anyone will have removed from the words of the book of this prophecy"

Douay-Rheims (15 words)
And if any man shall take away from the words of the book of this prophecy (15 words)

9 words vs 15, or an increase by 66%. This, but across the entire bible, accounts for the large difference in word count between the Vulgate and KJV or DR. It's text expansion from an inflected language to a not-really-inflected language such as English.

If, as you claim, the KJV contains intentional deformations of the Vulgate, you should be able to produce one. prosfilaes has repeatedly challenged you to find one, in the 700s of this thread. If you had anything to back up your claims, you'd already have posted them. Either you post a verifiable intentional biased/corrupt/whatever expansion in the KJV from the Vulgate, or we can consider your claims just more kooky poppycock.

All you've done is glance at the increase in number of words between languages and jump to a conspiratorial conclusion. As usual.

1085faktorovich
Apr 29, 2022, 9:21 pm

>1067 Petroglyph: The Brown Corpus is a great example of a horrid strategy for computational-linguistic analysis. It includes 500 different texts with 2,000 word samples per text (= 1 million words) in different genres only in the year 1961. They were tagged on word-types or word-combinations such as "adjective + Auxiliary". "This corpus first set the bar for the scientific study of the frequency and distribution of word categories in everyday language use." This bar was thus set in the wrong place because a random sampling of random texts (a list of newspapers that published these is provided, but the samples themselves were not published, and thus cannot be checked) cannot be used for authorship attribution or even to reach conclusions about usage (if the samples cannot be checked to verify it there were any glitches, or if the system might have made a mistake). My own corpus includes 7.8 million words, so if the total word-count is amazing for you, mine is bigger. It is also better, as I have done, simply to count the percentage of adverbs, adjectives and other types of words, than having to choose with bias between minor linguistic elements such as "verb + Auxiliary, singular, present", as comparing all of these minor elements would create a completely chaotic and infinitely complex comparative diagram. The problem is not that Renaissance texts have not been tagged, but rather that linguistic style is revealed in the common word-types/ punctuation types used in all texts with different degrees of frequency, as opposed to in minor unique elements that might only be used sparingly by some authors, and not by others. And yes, it is important for researchers to do their own tagging, in case computer programmers did the tagging for an existing corpus, and they did not understand some unique linguistic elements in a given corpus that might have introduced a myriad of glitches during the tagging process.

1086faktorovich
Apr 29, 2022, 9:38 pm

>1071 Petroglyph: Having read the article about it, I am certain I understood it correctly, and if you think I did not, you are the one who is misunderstanding it. "A jumble of upper- and lowercase letters, like a ransom note created with pasted letters cut from magazine clippings. (The idea, so far as there is one, is that the erratically capitalized line is meant to represent a sort of mocking imitation of the original line.) This is now the canonical format of the Mocking Sponge meme." In summary, the Mocking Sponge is designed to appear "like a" threatening "ransom note" that disorients and confuses while suggesting a death-threat, all while quoting somebody else's statement and making it seem as if they have made a threatening message or have said something worthy of extreme derision. This is an extremely anti-social form of communication that yells out an insult about somebody else's words, as it attempts to censor or cancel these words as inferior to the brilliance of the mimicker. So unless you are attempting to post "ransom notes" in this thread, the Mocking Sponge font has no place in a scholarly discussion. I would never use this font because I want to maximize readers' comprehension of what I am saying, and what I am quoting; so I would not disrupt comprehension by using a deliberately comprehension-limiting font.

1087faktorovich
Apr 29, 2022, 9:49 pm

>1072 Keeline: "I read The White Ribbon Boys of Chester (1916) and was convinced that I knew the ghostwriter." The difference between your approach and mine with the Renaissance project is that I considered thousands of bylines or nearly all of the bylines that wrote in the genres I was researching before reaching my attribution conclusions for the six ghostwriters. Instead of "youthful" intuition, I invested hard work and thorough analysis of the evidence in my experiment and the accompanying research to check the quantitative findings. And there are thousands of specific whole passages that I found plagiarized (or the equivalent of copied-and-pasted) between texts with different bylines and titles across the Renaissance; there are also several books that are near or complete plagiarisms of other books (there were scandals about some of these like "Gervase Markham's" horse books). Plagiarisms of sections in "Shakespeare" from otherwise bylined preceding texts have been a subject for scholarly study for centuries; you can find some if you open almost any scholarly edition of "Shakespeare".

1088faktorovich
Apr 29, 2022, 10:10 pm

>1073 Petroglyph: Yes, I have analyzed these types of studies, and have proven them to be wrong. Examples of studies I reviewed include:

Mathis Gilles, “Looking for Rhyme and Prosodic Patterning in Richard III”, Bulletin de la société d'études anglo-américaines des XVIIe et XVIIIe siècles, 49 (1999), 77-110.

Gary Taylor and Rory Loughnane, “Chapter 25 The Canon and Chronology of Shakespeare’s Works”, Authorship Companion, G. Taylor & G. Egan, eds., The New Oxford Shakespeare (Oxford Scholarly Editions Online; Oxford: Oxford University Press, 2017).

I explain my findings in this paragraph in Volumes 1-2 of BRRAM:

Past studies such as Gary Taylor and Rory Loughnane’s have focused on measuring the percentage of rhyming lines, triple rhymes, end-stopped lines, feminine versus masculine endings, caesural pauses, and other poetic elements. Here is a sample of the percentages of rhyming lines from highest to lowest according to Mathis Gilles’ data: Midsummer Night’s Dream (66%: 1595/6; 1600: Percy), Love’s Labors Lost (62%: 1595; 1623: Percy), Comedy of Errors (over 42%: 1590-4; 1623: Jonson), Romeo and Juliet (over 42%: 1594-6; 1597: Percy), Richard II (around 20%: 1595; 1623: Percy), Macbeth (11%: 1606; 1623: Percy), 1 Henry VI (around 10%: 1590/2; 1623: Percy), Richard III (around 5%: 1591-3; 1623: Percy), and Winter’s Tale (near-0%: 1609; 1623: Jonson). This data indicates that the plays with the highest percentages, or with over 20% of rhyming lines, had been performed before around 1598; the plays with near-0% rhyming lines were largely first-performed closer to the final years of “Shakespeare’s” staging career near 1609. Most previous studies have generally agreed that “Shakespeare’s” decreased the percentage of rhyme in his plays over time. An example of an early rhyming-preference in Percy’s 2 Henry VI (1594) includes this exact rhyme: “And shall these labors and these honors die?/ Shall Henry’s conquest, Bedford’s vigilance,/ Your deeds of war, and all our counsel die?” Both Percy and Jonson appear randomly across this rhyme percentage scale. The 27-tests suggest an alternative explanation for this chronological decline in rhyming. William Byrd’s signature appears in the rhyming Addition III from the Apocrypha Sir Thomas More, but this play as a whole is dominated by Percy’s signature. This example proves that Byrd assisted the Workshop with musical procedures such as measuring and rhyming; Byrd might have insisted on being contracted for these because he held the music monopoly. Byrd’s death in 1623 might have released his versification copyrights for some of the Workshop’s earlier dramas for publication in the First Folio. The Workshop’s partnership with Byrd appears to have disintegrated after 1596, and this was reflected in the steep drop in the number of rhyming lines after this juncture.

I would have to repeat Forsyth's experiment, and research the various potential explanations for his findings to explain why it was wrong. It is theoretically possible Forsyth was right in his conclusion, if he found some minor variations in language usage that change as vocabulary grew, or linguistic preferences changed as this particular author changed. But if the 27-tests I designed are applied systematically to texts, the works an author wrote in his youth and old age are easily identifiable as the work of a single authorial hand (for example, I matched Percy's earliest dramas from 1584 to his latest dramas from 1648). I tested several other age-related studies, and found other explanations or contradictory evidence that disproved the idea that age has a significant impact on authorial style.

Yes, broadly speaking, I have found the hand of the translator to be identifiable (and not the hand of the underlying author being translated) with my 27-tests. The sacred nature of the Bible is significant because it has biased linguists that have attempted to attribute it, so that some have assigned sections to Apostles etc., as if these were actually possible conclusions for any post-original translations.

1089faktorovich
Apr 29, 2022, 10:19 pm

>1079 Keeline: In my translation of some of William Percy's variedly bylined plays (and also in some of the other ghostwriters' projects), I found sections he plagiarized from never before published handwritten archival manuscripts claimed to be from decades or centuries earlier, many of which have never even been published since. Some of these documents were held in the Northumberland/ Percy family archives that would have been private, or inaccessible to anybody from the general public, even if they were highly intent to plagiarize from these sources. You see the number of volumes in the BRRAM series means I have found these types of plagiarism cases that cannot be explained by any other means that my 6-ghostwriters attribution conclusion (and they have to be these specific 6 people with very special access etc.).

1090faktorovich
Apr 29, 2022, 10:41 pm

>1084 Petroglyph: While you can imagine that a 62% increase is based on some differences in foreign languages (English vs. Latin etc.), there is no such excuse when comparing "King Lear", whose Modern English version has 26656 words, while the Early Modern English version had 25879. This is indeed a pretty slight difference, but "Hamlet's" modern version has 31121 words whereas the Early Modern version has only 17255 words. And these types of extreme, doubling word-count differences are normal in this Renaissance corpus.

Just as with the KJV/Vulgate there are divergences between these versions of "Hamlet", as in the Early Modern version:

2. Sit downe I pray, and let vs once againe
Assaile your eares that are so fortified,
What we haue two nights seene.

And the Modern Taylor 2016 version:

BERNARDO Sit down awhile,
And let us once again assail your ears,
That are so fortified against our story,
What we have two nights seen.

This is indeed a tiny example, as only 2 words have been added, but the order of the words, the spelling, and the words themselves have been altered in several ways that would have changed the linguistic counts.

You have just produced an example of differences between the Vulgate and the English version (even if the change of language itself isn't noticeable).

1091Petroglyph
Apr 30, 2022, 12:10 am

>1085 faktorovich:

"one million words, you guys!! -- I really should have put an explicit sarcasm marker there, shouldn't I?

Faktorovich -- who has access to thousands of corpora containing tens of millions of words (if she only knew where to look), who runs a business through the internet, and who works with computers all day long -- reads a wikipedia page, briefly considers the sixties/seventies and their standards for "digital" and material gathering and the state of computational linguistics, and she finds they are her own.

a coMpLEtEly cHaoTic aND InfInitElY CoMpleX ComPARatiVE diAGrAm

So complex that a mere one-million word corpus from the sixties and seventies could manage it.

It is also better, as I have done, simply to count the percentage of adverbs, adjectives and other types of words, than having to choose with bias between minor linguistic elements such as "verb + Auxiliary, singular, present", as comparing all of these minor elements would create a completely chaotic and infinitely complex comparative diagram.

There are so many quotes from you in this thread that make it blindingly obvious that you are just play-acting at being a scholar, but that you lack the substance to do so. This is one of them. A proper scholar, when faced with "it would be very complex to calculate this by hand", would try and find ways to make things calculatable and presentable. You know: research? Develop methodologies? Or they could look around and see if the problem had already been solved (say, in the seventies? By some early corpus linguists? and half a century of research and improvement since?). But no. Not Faktorovich. When faced with "This math too hard" you just ignore the matter and think that, by only taking into account the bits that are immediately, intuitively obvious to you, you have done the right thing. Possibly even a clever thing. You also tell yourself that you're right in staying entirely within your comfort zone for some sciencey-sounding reason, such as "it's going to create bias, I can tell in advance". Who knows, maybe you felt impressed at your own reasoning and congratulated yourself when you came up with it.

I am so, so glad that my mental world is not limited to a Faktorovich-size imagination. I'm grateful for working with clever, imaginative, competent and creative people who challenge me and allow me to share in their process of discovery. I may have been taking them for granted.

it is important for researchers to do their own tagging, in case computer programmers did the tagging for an existing corpus, and they did not understand some unique linguistic elements in a given corpus that might have introduced a myriad of glitches during the tagging process

I'm glad Faktorovich has graciously consented to let us know how she would go about training and refining and improving an automated tool. She's already put her mind to it, and she's just as quickly concluded that it would be difficult to implement and that a single iteration would contain errors. Best to abandon the whole idea.

The state of faktrovian scholarship is depressing. I should tell the people I work with how much I appreciate them. BRB, sending a mass email.

1092Petroglyph
Modificato: Apr 30, 2022, 12:20 am

>1086 faktorovich:

1093Petroglyph
Apr 30, 2022, 12:10 am

>1088 faktorovich: Yes, I have analyzed these types of studies

Lol. Don't change the subject to other papers you may have talked about in your self-published rubbish. I asked you a question about the Forsyth paper.

Nothing in that posts leads me to believe you've even glanced at more than the abstract. Just to confirm that you actually even can access that paper, can you tell me what the caption to Table 3 is in the Forsyth paper?

You're also avoiding my translation question, again by talking about stuff you'd rather be highlighting instead of answering the question.

Would the Bible being regarded as sacred change the conditions under which the hand of the translator can be seen in their translations? I'm not asking about present-day linguists (or your misinterpretations of their results). I'm asking you about 4th-century translation practices and whether they are different from today's when sacred literature is involved, given 4thC valuations of "sacred".

Quick question: in that Rybicki and Heydel paper, what is the caption of the second graph on page 712? I can't quite read it. Perhaps you could help me out?

Until you show me you've actually engaged with these papers, I will be justified in treating your non-answers as rubbish.

By the way: there's no shame in admitting you don't have access to these papers. Most people don't. But most other people wouldn't casually pretend they have soundly disproven them, either.

1094Petroglyph
Apr 30, 2022, 12:11 am

>1090 faktorovich:
No response to the direct question in that post. Just more changing the subject and comparing apples to oranges.

Kooky poppycock it is.

1095Petroglyph
Modificato: Apr 30, 2022, 12:25 am

In this edition of Petroglyph's After-Dinner Stylometry Corner I'll be taking a look at Arjuna Tuzzi & Michele A. Cortelazzo's 2018 paper "What is Elena Ferrante? A comparative analysis of a secretive bestselling Italian writer." In this study, Tuzzi and Cortelazzo (henceforth T&C) assemble a representative corpus of contemporary Italian novels and use various methods to see which author(s), if any, are most similar to Elena Ferrante -- or possibly even indistinguishable from her. They conclude that none of the authors in their corpus are likely to be Ferrante, but that Domenico Starnone's novels are similar enough to suspect that he knows more about the matter than he lets on. {Extra-textual note: For the record, Starnone has always denied being Ferrante}.

Whether it is ethical to try and unmask a secretive pseudonymous author, and a female-presenting one at that, is a different question that I'm not concerned with here. (Personally, I don't think it is, but I also don't think that this particular cat is going to be put back in the bag). The reason I'm summarizing this paper here is because it's a) a stylometric analysis about a case that people might actually have heard about; b) it's straightforward and uses no idiosyncratic methodologies; and c) it is a good illustration of how authors select tests and why some tests may be more appropriate than others.

The paper itself (doi:10.1093/llc/fqx066) is paywalled, but here is a pdf copy (if dropbox wants you to sign in first, just close that pop-up and the pdf should be accessible).

In summarizing this paper, I'll be using my own headings and numbering. Also: right-click the thumbnails to access larger images.

1. Corpus selection

T&C looked at 150 contemporary Italian novels by 40 authors from all over the country; 13 of these authors are women (including Ferrante). Since Ferrante writes a) novels in b) Italian, T&C did not include non-novels, and they excluded translations into italian as well. For the same reason, a volume of Ferrante's letters, essays and interviews was excluded from consideration, too. Furthermore, T&C made sure to include the authors who have at some point been proposed as the author behind the Ferrante pseudonym, and they took care to include a batch of authors from Ferrante's likely Region (i.e. Campania); the corpus is also heavily weighted towards popular authors (i.e. many books sold), and authors who've been characterized as "literary" -- two more characteristics of Ferrante's output.

2. Analyses

This corpus was analyzed in 4 ways.

2.1. The position of Ferrante and her novels in the corpus

2.1.1. Ferrante and her novels in the entire corpus:

The author closest to Ferrante is Domenico Starnone (part A); the novels closest to Ferrante's are Starnone's (part B).

2.1.2. Ferrante and her novels vs all female authors and their novels:

No female author overall is close to Ferrante; some novels by Milone, Parrella, Sereni and Murgia are close to Ferrante's novels.

2.1.3. Ferrante and her novels vs all authors from Campania and their novels:

The closest author overall is Starnone; the closest novels are Starnone's later works, though his earliest novels are quite different (they're in the bottom left quadrant). Once Ferrante begins publishing (in 1992), Starnone's novels cluster with hers.

2.2. Distance measures

T&C want to use a distance measure: they want to compare all authors and all novels with each other mathematically and come up with a measure of "difference" or "similarity" between any two novels. That measure of difference can then be plotted in a graph and translated into distance -- similar novels are close together, dissimilar novels are further apart, and the relative distance between any two authors/novels translates directly into how similar/different they are.

For this particular paper, T&C will be using a measure called Labbé's distance, the calculation of which I've hidden behind spoiler tags for those who want to skip such things. From each of the 150 novels a random chunk of 10,000 words was taken. Every pair of chunks (A and B) had their individual word frequencies added up (for all the words in those chunks), and the distance (or dissimilarity) between chunk A and chunk B corresponds to how different A and B are from that sum. (The idea being that if A and B were penned by the same author, their word frequencies would be similar. Or at least: more similar than if either chunk were by a different author). Every single chunk was compared to the other 149 chunks. And then this process was repeated 500 times with another random 10,000 word chunk from all 150 novels. The distance between any two novels is then calculated as the mean of their 500 sample-to-sample comparisons. The end result is a matrix with dimensions 150x150 and 11,175 cells, which can be translated into a distance graph.

For reference: Faktorovich looks at the six most frequent words. In absolute terms!!! All she's got is the, is, of, in, to, ....

A note on why sampling is performed. You can also skip this if you want: Pre-empting Faktorovich's oft-repeated objections against sampling and automated chunking and all the other stuff that her manual copy-paste-slogfest could not possibly hope to accomplish: yes, of course the novels aren't all 500,000+ words long, so yes, of course there are many stretches of text that are contained within multiple chunks. T&C are well aware of that. No, this is not falsifying data. It isn't nonsensical, either. If you think that, you do not know what randomized sampling means: it means that the researchers do not choose which 10k words they will work with. Could you get results for just one random chunk of 10k words per novel? Sure. But what if you could turn back time, re-run the test and select, at random, another chunk of 10k words? For each of the 150 novels? Would that change the results? Possibly -- impossible to tell a priori. So let's run the same test with a different random chunk that has different beginning and ending points! In fact, let's turn back time 499 times and run a different iteration each time! One iteration of a test might contain random deviations that skew results one way or the other, or that obscure meaningful patterns. But taking the averages of 500 iterations of the same test -- that will amplify any actual patterns of similarity, while sifting out the random ones (because they won't all point in the same direction and they'll cancel each other out. In theory.).

This process, by the way, of running the same test multiple times on random samples and then averaging over all tests, is super, super, super normal and entirely uncontroversial in many areas of statistics. The Bootstrap Consensus Tree from post >1037 Petroglyph: about the New Testament books is a small-scale example of that technique.

2.2.1. Cluster graph of all 150 novels

Image A shows a cluster graph of all 150 novels; B and C are close-ups of portions of A. Starnone's novels occur in two places: his early novels are in their own little cluster in image C, whereas his later novels (coinciding in time with Ferrante's output) are clustered together with Ferrante's.

2.2.2. Ranking novels similar to Ferrante's and Starnone's.

The novels that are most similar (the least distance) to each of Ferrante's are: a) Ferrante's other novels, and b) Starnone's. Conversely, the novels that are most similar to each of Starnone's are: a) Starnone's other novels, and b) Ferrante's novels. The two bodies of work are by no means identical/indistinguishable, but they are much more similar to each other than to any other author/oeuvre.

A few other authors and novels are fairly similar-ish, too, but the Starnone-Ferrante connection really stands out because it is so systematic. I'm not going to screenshot all these tables, but you'll find them on pp. 693 and 694. T&C note that it almost looks like Ferrante "has more in common with Domenico Starnone than he does with himself" (p. 694).

2.3. Distance measures -- grammatical words

T&C consider the possibility that the systematic similarity between Ferrante and Starnone might be due to similar themes and settings: both bodies of work tend to centre around the lives of Neapolitan families in the 1950s and onwards. This all but guarantees many lexical similarities that could inappopriately skew the results.

In order to remove this potential skew, T&C turn once again to Labbés distance, only this time they count the grammatical words only -- 430 unique function words, to be exact, an exhaustive list of which is provided in Note 3 (pp. 701-702). Another difference from the last test is that the chunk size this time around is 5000 words. But once again, they perform 500 iterations of this test and write the averages in a 150x150 distance matrix.

The results are almost identical with those under heading 2: each of Ferrante's novels is most similar to Ferrante's and Starnone's novels; each of Starnone's novels is most similar to Starnone's other novels and those of Ferrante. The distance to other authors/novels has increased: they are now further away from either Ferrante or Starnone. You'll find these tables on pp. 696-697.

T&C add this:

in some cases, we see that Elena Ferrante has written novels that are more similar to those by Starnone, than other novels written by Starnone himself. Working with only the grammatical words, thereby eliminating any possible similarities due to factors linked to historical, social, and cultural contexts and general content in the novels, the similarity between Starnone and Ferrante is reinforced and the classification method mix up Ferrante and Starnone as if dealing with the same hand. (p. 697)

2.4. Qualitative analysis

Finally, given the results of the quantitative tests so far, T&C check whether Ferrante and Starnone share verbal tics -- are there words that both of these authors use more often than other authors in this corpus? Turns out there are:

three emblematic examples of words that are not common in Italian but occur in Elena Ferrante and Domenico Starnone’s novels: risatella {little laugh, noun}, present, both in the singular and the plural, only in Ferrante and Starnone; sfottente {teasing, adjective}, extensively seen in Ferrante (forty-three occurrences) and Starnone (twenty-eight occurrences) and found in the remainder of the entire corpus only twice more (in two novels by two different authors); malodore {stink, noun}, this variation is present solely in Ferrante and Starnone, while in the rest of the corpus only one author, Francesco Piccolo, uses a similar word with a variation in the spelling, maleodore (p. 697)

3. Conclusion

Ferrante stands out in this corpus as a singular voice -- there are no other authors that take up the same space in the PCA graphs, or that is indistinguishable from another author in this corpus (which has been selected to include authors suggested to be Ferrante, and authors that may reasonably be supposed to share similarities: female, from Campania, popular/best-selling).

In their conclusion, T&C propose various outcomes (p. 698).

It is possible that Ferrante represents a new genre, or a new way of writing.
It is also possible that Ferrante's novels are the product of multiple authors working together.
Then, there is the distinct possibility that Ferrante is an author who has only ever published Italian novels under that pseudonym (i.e. there are no other Italian novels by the same author but under a different pseudonym).

At this point in the conclusion it may be useful to recall an earlier comment (p. 687): when T&C excluded translated novels from their corpus, they noted that this excludes the work of Anita Raja, who had been identified by investigative journalist Claudio Gatti as Ferrante {Petroglyph's extra-textual note: Gatti bases his suspicions on the increased frequency of large royalty payments from Ferrante's publisher to Raja after Ferrante's global success}. As it so happens, Raja is married to an author called Domenico Starnone.

I will end by saying that T&C do not positively identify the pair Starnone & Raja as Ferrante. They do, however, conclude their paper like this:

Of the thirty-nine authors that have been taken into consideration, Starnone is the only author who demonstrates clear-cut and consistent similarities with Ferrante. This affinity cannot be simply explained on the grounds of belonging to a common social-cultural group. At most, it may only make sense in the light of a strong link between Starnone and the author who signs their work as Elena Ferrante. {...} it is rather difficult to imagine that Starnone has not played any role in the planning and/or the drafting of Ferrante’s work. It is difficult to precisely define his role: he could also be just one of the hands and heads that have made up the Ferrante phenomenon, but he has left his mark on it in some way. There is a good chance that Domenico Starnone knows ‘who is’, or rather, ‘what is’ Elena Ferrante.

"What is Elena Ferrante" is also the title of this study. T&C do not come out and say directly that the Ferrante books are the product of a collaboration between the married couple Starnone and Raja, because the corpus for this paper cannot possibly prove such a contention: there are no Italian novels by Raja -- only translations. They are very careful not to overstate the conclusions they are able to draw from this particular corpus.

1096Petroglyph
Apr 30, 2022, 12:17 am

Final notes

So. What does Petroglyph think the relevance of this study is for this thread?

Preponderance of evidence is very important. In T&C's Ferrante study, Starnone is consistently the top candidate for every test they perform. Other people that may be (near-)top candidates in one or two of the tests turn out to be incidental hits when seen against all results.

The same principle is applied in this blog post, where Patrick Juola explains how he was approached to determine whether the author Robert Galbraith was, in fact, JK Rowling's pseudonym. He describes the four tests he performed, and notes that the "only person consistently suggested by every analysis was Rowling, who showed up as the winner or the runner-up in each instance."

He adds:

At the same time, we can do some crude statistics about the likelihood that a randomly chosen author would have a style that similar to Rowling, and by extension how strong this suggestion really is. Out of four candidates, Rowling was consistently #1 or #2 (i.e., in the more similar half) of each feature chosen; it's therefore only 50/50 that a randomly chosen author would be in the more similar half. With four studies (handwaving independence assumptions), there's therefore only one chance in 16 that the person would "pass" all studies as being similar to Rowling. If we needed a stronger suggestion, we could easily gather more data, more distractor authors, or simply run more experiments on different variables.

So again: do multiple tests, and go with the preponderance of the evidence.

You may wonder: where does Faktorovich draw the line? Well: at ~8-10 positive results out of 27 "tests" (~29%-37%). She does that because there are less than three instances in her corpus where there are more than, say, 24/27 "similarities". I'm not lying :

The highest number of matches occurred in two 25/27 outputs for two nearly identical versions (both with modernized spelling, but alternatively edited) of 2 Henry VI and its alternatively-titled Two Famous Houses of York and Lancaster. There were no instances of matches between 20 and 24 tests. At 19/27, two “Thomas Nashe”-bylined and Verstegan-ghostwritten texts matched: Almond Parrot and Terrors Night. (...) In contrast, most of the texts in this corpus include some degree of editorial or writerly assistance from others. If a text matched any other text on over 13 tests, it tended to have a single dominant ghostwriter without equivalent matches to rivals’ groups. The majority of matches had lower match-levels between 10 and 13 tests. Texts with 8-10 matches were particularly likely to have been produced by two or more ghostwriters in collaboration. There were two matches at 18-tests: the two versions of the Percy-ghostwritten “Shakespeare” play 3 Henry VI and its alternately-named Richard Duke of York version, as well as Percy’s Tempest and Two Noble Kinsmen. Across all outputs, the median number of matching tests was 5; in other words, half of the outputs fell at or below 5/27-matches. The most common output was 4. The upper fence (above which results were technically “outliers”) was 10.5. Given these statistics, matches at 10 or greater were all outliers, or indicated very unusual and statistically significant similarities. (Re-attribution pp- 33-34)

(compare >290 faktorovich: and >305 faktorovich:)

In her corpus there were two works with a "sImIlArItY sCoRe" of 25/27 (and that was two differently-spelled versions of the same work); and two each with a "sImIlArItY sCoRe" of 18 and 19. That's it. That is the grand total of all of Faktorovich's copying and pasting and changing percentages to 1/0 and assigning pattern names to barely-different sequences of the six most frequent letters and words. Two re-spelled versions of the same text.

If that isn't a fitting end to all her rubbish, I don't know what is. She couldn't even match multiple works of the same author together.

The T&C paper (and the Darmon et al. paper, and the Juola blog post, etc) are examples of the standards that professionals set for themselves. Faktorovich, well, she counts whatever tests show hits as relevant (8/27), and ignores all the misses (19/27) as irrelevant. And this, she claims (in >165 faktorovich:), means that her her method is "entirely logical and unbiased".

She's 0 for 68 in the poll in >458 Petroglyph:. I'm sure it's just because she is a put-upon woman, fighting against a white establishment. Yeah. That's the only logical and unbiased explanation.

(Note to Faktorovich: that last paragraph is sarcastic.)

1097Stevil2001
Apr 30, 2022, 8:27 am

Petroglyph, in addition to everything else, your meme game is solid.

1098lilithcat
Apr 30, 2022, 8:51 am

>1096 Petroglyph:

I wonder if T&C considered the possibility that the works were, in fact, literally ghostwritten by Verstegan, communicated by him to Starnone via ouija board, and then translated into Italian and Neapolitan by Raja?

1099Matke
Apr 30, 2022, 11:41 am

>1096 Petroglyph: Thank you for a clear explanation using data and examples that a non-scholar, non-math-expert person can easily understand. I could detect “a” flaw in the initial argument, but not “the” flaw.

The numerous examples of posts that are mostly or completely non-responsive to questions asked by others show the shallowness of faktorovich’s reasoning, her lack of concrete examples to help further her theory, and her unbelievably stubborn insistence on being right on all questions, all the time. The inability to admit a mistake or to see another’s point of view is stunning.

Like others I’ve followed this rather windy thread; who can say why? But I have learned about close arguments, statistics, linguistics, and textual analysis while here. So thanks to you and many others for an entertaining experience.

1100bnielsen
Apr 30, 2022, 12:25 pm

>1066 Petroglyph: Yes, honestly. But the audience is probably larger here :-)

1101faktorovich
Apr 30, 2022, 1:06 pm

>1091 Petroglyph: The "self-image" drawing is an example of a non-existent, delusional creature, whereas the "LibraryThing" image is of an adorable real animal; so you are arguing that I am in fact real and adorable? The explanatory text under both of the diagrams are about over-complexity. I really hope you will return to focusing on what I am actually saying (regardless of if it is or is not complex) versus focusing on my animalistic cuteness.

I am in the process of translating "Restitution", an extremely convoluted history book + dictionary. I am focusing my energy on this project entirely. I am not going to digress from it to start an entirely new experiment to check word-order and word-type-combination patterns or strategies for evaluating these for authorial attribution. These are not necessary and do not fit logically with my 27-tests method, which has worked to reach precise results in my Renaissance project. If I had any uncertainty in my attributions, I would have evaluated these and other strategies before settling on the testing method. Without such uncertainty, I am just pointing out briefly to you that I have explored these possibilities when I was choosing the tests that are best to apply, and decided that these strategies were not productive.

1102faktorovich
Apr 30, 2022, 1:45 pm

>1093 Petroglyph: There are 2.2 million books published in the world annually. If there is a book or study out there I am asked about I do one of three things: 1. Glance at it briefly to check if there are any general problems I notice with the methodology, as I did in the Forsyth case (this is necessary to evaluate if it is worth-while to read a study closely). 2. Read it closely and replicate or check the results to write in my own research regarding how the approach is correct or incorrect. 3. Ignore it completely because it is irrelevant to my own research and reading interests. I have a filmographic memory, so I would have remembered Table 3, but since I chose option 1 of only glancing at the abstract and a few other elements in the paper; it would be dishonest for me to look back at the paper and tell you what is in Table 3. You asked for a review copy of my BRRAM series a few months ago, which should have indicated that you have chosen option 2, or to read my BRRAM closely to respond to it. I did not similarly ask you for a review copy of the Forsyth essay, as it is irrelevant to my research. I am not avoiding anything but straying away from my focal research, or the translation of "Restitution". In it, I address several 4th century (or forged later) translation/ theological manuscript mysteries. These are extremely complicating as they involve me translating Latin/ Old German etc. ancient texts, and comparing these with later manuscripts. I explain my findings in the annotations, but taking any one of these annotations out and quoting it here would only confuse you, as you would assume it is my entire point when it is a sentence out of a 200,000 word book. "Would the Bible being regarded as sacred change the conditions under which the hand of the translator can be seen in their translations?" What conditions? If a researcher regards a text as "sacred", this changes their ability to attribute the text to a non-Apostle/non-God etc. entity such as an author, or a ghostwriter. "I'm not asking about present-day linguists (or your misinterpretations of their results). I'm asking you about 4th-century translation practices and whether they are different from today's when sacred literature is involved, given 4thC valuations of 'sacred'." 4th century translation practices? I am not even sure if any of the surviving "4th century" manuscripts were actually written in the 4th century, and were not much later forgeries ascribed to ancient 4th century writers for profit; that's why I said all of these ancient texts should be tested for their age. Neither you nor me can travel back in time to check what "practices" ancient translators practiced; we can only test the surviving texts for their linguistic attributes. I did not apply my tests to 4th century texts because that would be an entirely different corpus, and an entirely different experiment, and I am not finished with my BRRAM Renaissance project.

You are asking me to review this article: J. Rybicki, M. Heydel, "The stylistics and stylometry of collaborative translations: Woolf’s Night and day in Polish". On a brief review, I can conclude that the authors used an absolutely erroneous and irrational method of comparing Polish versus English texts. They never directly address the central problem of how texts in two different languages can be compared to each other in vocabulary frequency; since the two texts will have entirely different dictionaries of used words. All they say on the matter is: "She worked with the famously long and intricate Woolfian sentences, the more so that the Polish language, with its extremely flexible sentence structure, locates most of its rhetorical and pragmatic devices here. Also for this reason, most-frequent-word analysis was a well-suited approach to this experiment in translatorial attribution." One problem is that the researcher is fixating on the personality or experience of the translator, instead of rationally examining how the experiment would actually answer the problem presented in the title. And the sentence structure is irrelevant to the word-frequency measure that they are actually testing for. They also spend most of their introduction explaining that there were several different translators who put their name on the Woolf translation, and they see this as a positive, instead of being concerned that different signatures of the translators would interfere in their capacity to distinguish Woolf's from a single translator's style. Then, there is basically no concrete analysis of how Woolf's style differs from the translators, and the conclusion is digressive as it philosophizes, as if letting the diagrams speak for themselves. There is no raw data, or even data summaries outside of the diagrams. It is an absolutely horrid study that nobody should seriously consider when deciding if translators and authors might intersect in linguistic style. I downloaded this paper from Academia for free.

1103faktorovich
Apr 30, 2022, 2:27 pm

>1095 Petroglyph: "The end result is a matrix with dimensions 150x150 and 11,175 cells, which can be translated into a distance graph. For reference: Faktorovich looks at the six most frequent words. In absolute terms!!! All she's got is the, is, of, in, to" No, I look at every single word within every word in every one of the texts I evaluate, and from all of these derive the unique 6 most-common words in each text. Then, I compare these 6-word patterns for each of the texts against all of the other texts, in the Renaissance, the matrix is 284X284 texts, or 80,656 comparisons. I do not extract a maximum of "10K" words per text to avoid all possible selection-bias in choosing some words, but ignoring others while attributing an entire text; I test all of the words in all of the texts. It would also be impossible for any statistical analysis to compare all word-frequencies against all word-frequencies in even just two texts because one of them might have some words, while another does not have those, and has others; the words that neither of them have, or both of them have, and the mixtures between these would be impossible to give equal value without simplifying the data into a test that can be simply applied to all texts. This is why I only look at the 6 most-frequent words and compare these between all texts; these 6-word patterns can be compared between texts to arrive at a simple match-or-no-match or 0/1 binary result. I present the full data of these exact most-common words in all of the texts, whereas none of the studies Petroglyph is citing or is doing in this discussion present all (or any) of the raw data for which words were given which weight or matched which words in all of the tested texts. When I have tried asking for this data, I found lists of all of the words in a text ranked by their appearance or not in other texts; but when all this data is combined it tends to create miss-attributions because there is too much randomness when all of the words in a text are tested; that's why these computational-linguists just present pretty graphs that make it seem as if results were perfect, and then color them with flowery language without addressing the obvious lack of linguistic distinction or explanations for overlaps in similarity. The 6-word patterns do not present any of these problems, as all readers can check for themselves by studying or re-testing my data.

And if your testing method leads to different results by just re-shuffling the words out of a text that you have selected, this is already an extreme gap for bias that must disqualify the method is anti-scientific. Researchers who want to test a given method should arrive at the precise results you came up with if they replicate the steps; this is how they would test if there was an error, or a deliberate misrepresentation. This check is inaccessible if the data is designed to be chosen at random, and thus to change at every re-trial.

As for the study of the "three emblematic examples of words that are not common in..." These are basically rare words that represent favorite strange words of these authors. I tested similarly for patterns of rare words when I tested for slang words like "beshrew" and "monger" (the results are available on GitHub in the "Structural Elements in Shakespeare" file). Such words have to be consciously chosen to test for personality traits of authors, as these are not the most common words. My approach is more scientific as I tested for slang usage or avoidance, and this is known personality indicator whereas random common words might be related to a subject of a specific text, and not indicative of an author's broader character.

The multiplicity of the outcomes in this study indicate the researchers failed to dig far enough into this question to have actually solved the underlying attribution mystery (whodunnit?). The difference between multiple authors working together, and a single author working under a pseudonym is so wide that basically these researchers have not reached any findings that would further them in answering an attribution question. The point regarding the royalty payment is curious, but why is it left as a hint and not driven to a firm attribution statement? This is why I avoid current attribution mysteries, in favor for the distant past where all financial statements have been disclosed, and nobody is left alive to defend their privacy. There is a broad contradiction between the uncertainty of the bulletin-points in the Conclusion, and then the sudden certainty in the final paragraph that claims the authorship mystery has been precisely solved (though the researchers don't want to impose their opinion). This type of disagreement within a conclusion indicates the researchers' intent to confuse readers with contradictory findings, to leave them room to argue they meant to say the opposite if their conclusion is challenged.

1104faktorovich
Apr 30, 2022, 2:38 pm

>1096 Petroglyph: You are again forgetting that my data does not show a percentage of similarity, but rather a percentage of tests on which any two texts are within 18% of each other across an entire corpus. A match for any two texts is a roll of the dice that is extremely unlikely if results are random (18% is 11 points smaller than even the minimum 29%; random results would have led to the same number matches on average for all ghostwriters, and not the clear clusters in the actual data). As I explained previously, in a highly collaborative corpus like the 6-ghostwriters Renaissance corpus, when two or more ghostwriters co-write a text together, their combined similarity would be not only the 29%+ for one of them, but combined with the 29% for the other; and thus jointly over 58% of similarity. I explain all instances where bylines are contradicted by the underlying linguistic signatures with other evidence of textual mismatches aside from the 27-tests; for example, you can check that the top-6 3-word phrases are also different between them, proving that the bylines are disingenuous.

1105Petroglyph
Apr 30, 2022, 5:26 pm

>1102 faktorovich:

The Forsyth paper I brought up not in the context of your research and reading interests in general, but in the specific comment of yours in >1043 faktorovich: "I did not find any such clear-cut changes in an author's style with age, as instead I have found that style remains consistent for a professional author across a career." It's certainly relevant there.

But I'll take the gist of your answer (you've barely glanced at it and decided it's too incorrect for you to engage with), and I'll leave things at that.

1106Petroglyph
Modificato: Apr 30, 2022, 6:07 pm

>1102 faktorovich:

Here are my comments on Faktorovich's review of the Rybicki and Heydel paper "The stylistics and stylometry of collaborative translations: Woolf’s Night and day in Polish".

This paper, in part, deals with a translation of Virginia Woolf's novel Night and Day into Polish. The translation was started by Anna Kolyszko, who died before completing the work; another translator (Magda Heydel) finished the work. One of the goals of this paper is to see whether the two translators are distinctive enough in their respective styles that stylometric software can reliably pick out each contribution.

The authors mention at multiple points that this novel is a rare but useful case study, since "the results of stylometric analysis can be confirmed or denied by the translator herself" (p. 715). The authors mention at one point that "it is notoriously difficult to obtain information on the reasons and details of such translatorial collaborations (very often undertaken by translatorial life-partners) from either the translators or their publishers; usually, looming deadlines for lengthy popular novels are blamed." It is, therefore, an asset to be able to talk to the second translator and have her confirm the results of stylometric analyses.

On to Faktorovich's review, which (TL;DR) is evidence of careless reading; misunderstandings abound.

the central problem of how texts in two different languages can be compared to each other in vocabulary frequency; since the two texts will have entirely different dictionaries of used words.

... Wow. You really have no idea how they do this? They run tests on the books in one language (and hopefully get results that are correct). Then they run the same tests on those books in translation (and hopefully get the same results). You are imagining problems that in your "brisk impressions" are insurmountable, and assuming that the professionals are limited by the same standards.

One problem is that the researcher is fixating on the personality or experience of the translator, instead of rationally examining how the experiment would actually answer the problem presented in the title

You know, there's a rich history of theoretical debates in the body of literature that is translation studies -- questions of how much the translator's voice ought to shine through; whether a translated text ought to retain a "taste" of the original language or not; should a translator aim to be recognizable in the work she delivers; etc. etc. etc. Papers of this nature need to include a brief discussion of the relevant literature that this paper is a contribution to. If only to show they've "done their homework", so to speak, and let readers know that, yes, the authors are aware of the relevant literature.

Different sections of a paper have different kinds of relevance, for different kinds of interest.

Once again, you demonstrate unmistakably that you have no experience with actual research literature, and that the mental categories you apply to papers are frighteningly stunted.

(btw, when you say "the researcher is fixating", that should be "the researcherS ARE fixating". There are two authors to this study.)

{Faktorovich quote-mines}

You left a crucial part out of the bit you quoted, which I've added and underlined:

Thus the changes Heydel {the second translator} introduced into Anna Kolyszko’s text {the first, deceased translator} were not (or very rarely) lexical but mainly syntactical. She worked with the famously long and intricate Woolfian sentences, the more so that the Polish language, with its extremely flexible sentence structure, locates most of its rhetorical and pragmatic devices here. Also for this reason, most-frequent-word analysis was a well-suited approach to this experiment in translatorial attribution.

You take this as "all they say" about the comparisons that you claim the authors did between English and Polish texts. Well, for one thing, that is a clear lie. Most of page 711 deals with that (but we'll get to that in a bit).

That quote means that MFW-analysis was a good tool to compare the two translators with, given that Polish uses syntactical means (instead of lexical means) to accomplish various rhetorical and pracmatic effects, and the changes that the second translator made to the first translator's text were mainly syntactic in nature.

Secondly, by leaving the underlined sentence out of your quote, you've made it very unclear what the "locates most of its rhetorical and pragmatic devices here" refers to.

I'm pointing this out because your later comment that thE sENteNCe stRUCtUre IS iRrELevAnt TO thE WoRd-fReQUeNcY MeAsURe thAt thEY Are aCTUaLLY tEStinG foR shows that you just *did not grasp* the relevant distinction between syntactic vs. means of achieving pragmatic effects. It is much clearer when you also add in that underlined sentence. The authors state that, since the second translator did not change much of the lexical stuff contributed by the first translator (but changed a bunch of syntactic stuff), they were able to use lexical measures to differentiate the two translators.

an absolutely erroneous and irrational method of comparing Polish versus English texts

So "absolutely erroneous" that it correctly separated twenty-three novels in English (the graph to the left in the below image), and assigned most of the translations to the correct author, too (the graph to the right; translator's initial at the end of the file names) suggesting that the translator's imprint is weaker than the author's.

They never directly address the central problem of how texts in two different languages can be compared to each other in vocabulary frequency

That's a lie, Faktorovich. Here is how Rybicki and Heydel introduce the figures I screenshotted just now:

In even more practical terms, a script by Maciej Eder, written for the R statistical environment; converts the electronic texts to produce complete most-frequent-word (MFW) frequency lists, calculates their z-scores in each text; selects words for analysis from various frequency ranges; performs additional and optional procedures for better accuracy (culling and/or pronoun deletion); compares the results for individual texts; produces Cluster Analysis tree diagrams that show the distances between the texts; and, finally, combines the many tree diagrams made for various parameters (number of words, culling rate) in single a bootstrap consensus tree. The script was demonstrated at Digital Humanities 2011 (Eder and Rybicki, 2011) and its ever-evolving versions are available online (Eder et al., 2011).
To illustrate this development of the original Burrows Delta procedure and, at the same time, the phenomenon of the dominance of the authorial signal over that of the translator, it is worthwhile to consider the fairly typical case of two corresponding sets of texts: a collection of twenty-three English novels in the original (Fig. 1) and the Polish translations of the same novels by a variety of translators, identified by their initials (Fig. 2). As can be seen, the script’s procedure guesses the originals perfectly; and while the guessing of the authors in translations is somewhat less perfect, the authorial signal is much more marked than that of the translators; of the three represented by more than one book in the set, the translations of only one (rc) have been clustered together

That is nearly all of the right-hand column on page 711.

instead of rationally examining how the experiment would actually answer the problem presented in the title

Page 713 has the following graph:

(the surrounding pages talk about this graph)

This tree shows all the chapters of the Polish translation of Night and Day. Kolyszko (deceased) was the main translator of chapters 1-26; Heydel took over from chapter 27 onwards. The tree shows, indeed, a clear separation between these two groups of chapters.

This graph was confirmed as correct by translator Magda Heydel. Curious, since Faktorovich thinks that this paper uses "an absolutely erroneous and irrational method".

At some point, a careful reader might have noticed that the name of the second Polish translator, Magda Heydel, is very similar to one of the authors of this study, who is also called Magda Heydel. One can assume that someone doing a "review" of this paper would have noticed this. Right?

I would argue that in a body of literature dedicated to discussing the way that an original author's voice and those of their translators intersect and interweave, it is an asset to have an actual, experienced translator contribute with insights on how they actually approach their translations.

That's what I would do. But I'm no Faktorovich. Thank Cthulhu.

1107Petroglyph
Apr 30, 2022, 5:40 pm

Vota: Do you think that >1103 faktorovich: has successfully debunked the T&C Ferrante paper linked in >1095 Petroglyph:?

Corrispondenza attuale: Sì 0, No 27

1108Petroglyph
Apr 30, 2022, 6:14 pm

>1097 Stevil2001:

Thanks! If only my asset appreciation in the meme economy would translate to real-life gains!

1109Petroglyph
Apr 30, 2022, 6:16 pm

>1098 lilithcat:
I don't think T&C have considered this possibility. Perhaps someone should tell them.

Faktorovich might have thought of this, though. At least, I wouldn't put it past her based on her interpretation of "ghost-writer" in >414 faktorovich:.

1110Petroglyph
Apr 30, 2022, 6:18 pm

>1099 Matke:
That's very kind of you to say; thanks for your warm words!

As long as I'm interested and can avoid repeating myself too much, I'll keep reading papers and creating exploratory graphs. (My interest, one might say, is piqued, but has not yet peaked.) Cheers!

1111Petroglyph
Apr 30, 2022, 6:29 pm

>1101 faktorovich:

re: memes.

Sure, if that makes you feel good about yourself.

re: Parts-Of-Speech tagging, even on smaller and manageable samples in your corpus:

If I had any uncertainty in my attributions, I would have evaluated these and other strategies before settling on the testing method. Without such uncertainty, I am just pointing out briefly to you that I have explored these possibilities when I was choosing the tests that are best to apply, and decided that these strategies were not productive.

Sure, if that makes you feel all scientific. It wouldn't be the first time you've handwaved away some test or methodology that's too hard / mathy / complex or too labour-intensive for your manual copy-and-paste workflow.

1112faktorovich
Mag 1, 2022, 1:49 am

>1106 Petroglyph: If a researcher is at all relying on asking a potentially implicated ghostwriter-hirer or the like regarding complicity in such a potential fraud; then, the underlying research findings are entirely unreliable, as they are likely to have been checked with the implicated "author", whose opinion or cheerfulness might have been considered before reaching an attribution conclusion. In response to: "...The results of stylometric analysis can be confirmed or denied by the translator herself" (p. 715)."

"They run tests on the books in one language (and hopefully get results that are correct). Then they run the same tests on those books in translation (and hopefully get the same results)." This statement is absolutely illogical and nonsensical in practice. If you take any book in any single language and run tests on it, it would be absolutely impossible that the output data for the same book in any other language would be even relatively similar. This is indeed a case of comparing apples to oranges, while insisting that everybody knows that apples and oranges have always been compared and if they are weighed, measured, have their calories added up, have their seeds added up and various other elements; it is absolutely possible and likely they would show similar results. In the case of Woolf and translators; it's not even the same people writing in the two different languages.

It is an ideological dream that "translated text ought to retain a 'taste' of the original language or not". In reality, most translations are extremely unlike the original text they are working from. I explain some problems that can be introduced in a review of: Nikolai Gogol; Susanne Fusso, translator, The Nose & Other Stories (New York: Columbia University Press, September 1, 2020). $17.95. 368pp. ISBN: 978-0-231-19069-5. https://anaphoraliterary.com/journals/plj/plj-excerpts/book-reviews-summer-2020/

Then, you go in circles around the claim that the researchers were able to figure out that the second of the translators made "changes... to the first translator's text" that "were mainly syntactic in nature." Syntax is basically the order and organization of words in sentences. So how could the researchers have figured out precisely what type of changes each of the editors made unless they had several versions of the translation across the different stages as the translators worked on it. It might have been noticed if a draft or a first edition was published before the second translator edited the second edition, but this was not the case. So since the degree of syntax-editing by any one translator cannot be checked it should not be the center of the discussion. As it stands you are claiming that they used their assumption regarding who edited syntax to cyclically prove that there were two different translation style; you are assuming they knew the precise syntax use to make this attribution, when the basis for such knowledge has not been established in the article.

If an analyst has the assigned bylines of text; the easiest thing to do is to manipulate the data or a diagram to make it seem as if the output of an attribution experiment is to re-affirm precisely the established bylines everybody already believes in. Such replication of known bylines does not at all prove any accuracy of the applied method.

The section you quote summarizes the standard Stylo etc. attribution method with word-frequencies. It does not explain how two sets of frequencies of entirely different words (because they are in two different languages) can be compared to each other to check for similarity or divergence. The second paragraph in that quote then adds that the tests were applied to "twenty-three English novels in the original (Fig. 1) and the Polish translations of the same novels by a variety of translators". What is this statement even trying to say? If these were just 23 random English novels, why are they stressing that they are "original". And then, they took these 23 novels translated into Polish. And at this point, these authors did not think it was necessary to explain how they adjusted the tests for them to be at all applicable to texts in 2 different languages? They appear to have tested these separately, and their conclusion is that the English texts were accurately attributed, while the translated texts were inconsistently assigned. One reason for such inconsistency is if a single ghostwriter translated multiple English novels into Polish. If the precise data had been provided for who matched whom, and with what numbers/ frequency-counts; then it would have been possible for readers to check what these results actually indicate. Instead of addressing these important details, the author digresses into irrelevancies.

1113Petroglyph
Mag 1, 2022, 4:44 am

>1112 faktorovich:

If a researcher is at all relying on asking a potentially implicated ghostwriter...

Paranoid crank is paranoid. Yawn.

You quote me as saying "They run tests on the books in one language (and hopefully get results that are correct). Then they run the same tests on those books in translation (and hopefully get the same results)." Then you add thIs stATemeNT is aBsOLuTeLY iLloGICal And nONSenSiCAl iN PracTICe

Why? The analyses on the 23 English novels and their translations happen in two separate steps. They first run the analyses on the novels that were originally in English, and see if the software can correctly assign them to their authors. (It can.). Then, they take the Polish translations, and see if the software still can separate the books in author-clusters, or if the translator-clusters take over. Surely, researching whether an authorial voice can survive translation (across multiple translators, even) -- or not -- is a topic worthy of scholarly attention? What is so confusing about that? What is so impossible about that?

I genuinely don't understand what your problem is here. Maybe you could spell it out for me, using small words and short paragraphs?

I do sense your anger at being massively outclassed, though. So maybe it's that.

This statement is absolutely illogical and nonsensical in practice. If you take any book in any single language and run tests on it, it would be absolutely impossible that the output data for the same book in any other language would be even relatively similar.

Goddammit, it's your one-track mind again.

Look. Your highly idiosyncratic and deeply misguided sham of an "analysis" involves language-specific things like rate of passive voice, sentence length, syllable count, the six most frequent words and the six most frequent letters, etc. I understand that, if you were to run your "analysis" on an English text and its translation into French or Polish or Japanese, you'd end up with different figures for syllable counts and the top letters and words and whatnot. Sure. But that is your fault for making your tests so language-dependent. And it's your fault for treating absolute garbage like "top six letters" as a diagnostic feature.

And if the professionals were limited to your imagination, that's where the thing would end. Fortunately, techniques such as cluster analysis and multi-dimensional scaling and principal component analysis do not care whether the input is in English, in Polish, in Mandarin, Ancient Greek, or, indeed, genetic data, traffic data, medical trials, meteorological data, ...

It is an ideological dream that "translated text ought to retain a 'taste' of the original language or not".

I'm glad you've sorted that out for the translators. I'll let you do the honours of informing them. To think that a single blog post of yours in which you trash a translation you didn't appreciate could just casually settle this entire debate! Seriously -- why do translation studies even need multiple journals? They could have just asked you!

the researchers were able to figure out that the second of the translators made "changes... to the first translator's text" {...} they used their assumption regarding who edited syntax

Did you know that the second translator is one of the authors of this study? She wasn't "able to figure out" what she did. She didn't "assume". She... remembered? You know, used her brain? To recall the work she did on a book?

Syntax is basically the order and organization of words in sentences

Yes, that is how it's explained in secondary school. And for the first sentence in an undergrad textbook, that would do as well. But you really need to go beyond that, Faktorovich. Syntax is a little more complicated than that (especially in Slavic languages). But you've made your absolute lack of understanding of modern-day syntax clear in >821 faktorovich:.

to cyclically prove that there were two different translation style; you are assuming they knew the precise syntax use to make this attribution, when the basis for such knowledge has not been established in the article

... The researchers used analyses based on Most Frequent Words. They make the point explicitly that a) two translators each did a major part of the book, and b) the second translator's changes to the first translator's work were mainly syntactical in nature. Therefore, the researchers were able to use MFW-based techniques to separate the two translators, since the second translator's changes had left the first translator's lexical material largely intact. Intact enough for a successful separation between the two styles.

This point is very clearly stated in both the article and >1106 Petroglyph:. I cannot see an honest reason for why you've missed it twice now.

1114Petroglyph
Mag 1, 2022, 4:51 am

Ok, time to switch gears from a very, very tedious back-and-forth to something perhaps a little more amusing.

i just wanted to report on some more of Faktorovich's anti-stratfordian rubbish that I came across elsewhere.

A few months ago she briefly altered the Wikipedia page for the List of Shakespeare authorship candidates in an attempt to enshrine her kooky poppycock there. This happened in late January, when this thread was on hiatus.

When her kooky poppycock was removed nine minutes later, she reverted that reversion six minutes later, commenting that:

Without my edits, this page has major errors in its comprehension of my study that attributes 5 different authors as responsible for writing different "Shakespeare" texts. I also add corrections in the citation method. By undoing all of my edits this editor is deleting common-sense editorial changes that are factually true and do not reflect any bias

Yeah: her style is instantly recognizable. Her nonsense was also removed again two minutes later and has not been re-added.

That drama was then taken to the wiki talk page about Shakespeare Authorship Candidates -- a lot of familiar-sounding drama. Warning: that discussion just keeps going and going and going. Some more drama spilled over onto her own talk page.

It's exactly what you'd expect from her behaviour in this thread: she's miffed at not being taken seriously, confused at how her publication and citation history does not grant her instant respect/inclusion/noteworthiness; she complains about misogyny/antisemitism etc, offers her books to editors so they can read them and agree with her, touts her self-published credentials, excoriates editors for using anonymous pseudonyms, lists her Google scholar references, and the all the other Greatest Hits. Basically: a lot of pleading to please take her seriously no really she's a scholar and has overwhelming evidence just read here she's an expert.

Other wiki editors express confusion at her walls of text, her quick-fire dispute-every-point comments, try and remain non-committal, and stand firm by WP:SPS, which prohibits using self-published sources to prop up fake expertise. She keeps bumping into that one. WP:EXTRAORDINARY is brought up, as well: extraordinary claims require extraordinary evidence.

LibraryThing gets some mentions, too. It appears that Faktorovich is claiming that the interview here is part of what makes her noteworthy enough to warrant inclusion on wikipedia.

A few lol-worthy comments:

You, and other Wikipedia editors are actively involved in censoring the correct bibliographic citation of my research, after one of your editors posted an erroneous citation of my study. So why are you asking me about who can possibly be censoring me, while you are involved in this very act.
all researchers have suffered {from censorship} by not having access to the true attributions my study has uncovered
Are you challenging me to find out who is QAnon? Would it take me proving a current case for you to believe my findings about the Renaissance and 18th-20th century?
I think you make a horrid Wikipedia editor, especially on the subject of authorship because you do not clearly identify yourself on your User Page, and thus fail to allow other users to check if you have conflicts of interest especially on the subject of attribution
I have discredited all previous computational-linguistic studies among the hundreds I have reviewed so far in BRRAM. I am sure before I read the article you are referring to that it is erroneous. WhatamIdoing mentioned that they only analyze character-counts for 4 letters. This is just absurd. I will briefly glance at these articles and I'm sure it's going to be as erroneous as the previous hundreds of such studies I have countered. What you guys should really do is actually read Volumes 1-2 of BRRAM and perhaps the rest of it, where I explain why you are wrong to blindly trust where an article is published vs. what it says

Editor bonadea sums it up nicely, I think:

Since you are not interested in helping the rest of us build an encyclopedia, this conversation is pointless, perhaps apart from the fact that it shows very clearly that you have no expertise in stylometry, or even in linguistics in general. That is fine as long as your edits to Wikipedia are firmly based on reliable secondary and independent sources that have zero connection to yourself, but it doesn't look like that is something you are interested in doing. Editors also need to be able to understand the sources and to represent them correctly – the post above doesn't show either of those things. A mindset that is open for the possibility that other people might be right is also crucial. Most people are not interested in being Wikipedia editors, and that's also fine. Good luck with your future work. And thanks for linking to that external discussion forum where your method was discussed

1115Petroglyph
Mag 1, 2022, 5:04 am

>1078 norabelle414:

Perhaps. Is it worth, though, to switch forums and lose all the "institutional knowledge" that's been built up here? (We must be approaching 300,0000 words now.) Would the poisoned nature of this thread carry over into the new?

1116MrAndrew
Mag 1, 2022, 7:13 am

"I am sure before I read the article you are referring to that it is erroneous"

Outstanding.

1117MrAndrew
Mag 1, 2022, 7:35 am

"You are not interested in improving Wikipedia, but rather advertising whoever you are ghostwriting for who pays you to puff them on Wikipedia."

Every time you think this can't get any better, it surpasses itself.

1118MrAndrew
Mag 1, 2022, 7:57 am

"As for the Director of Penguin, I don't think the Director of Penguin selfpublished a book that they demand should be treated as an academic publication. "

I'm really down the rabbit hole now. And i'm not even quoting the QAnon stuff. I blame you, >1114 Petroglyph:

1119MrAndrew
Mag 1, 2022, 8:06 am

"Please read WP:BLUDGEON" and then bludgeon us with text walls about it. Priceless.

1120lilithcat
Mag 1, 2022, 9:54 am

"I was planning on making these edits across Wikipedia's British Renaissance pages . . ."

Oy.

Thanks for the time sink, Petroglyph!

1121norabelle414
Mag 1, 2022, 11:37 am

>1114 Petroglyph: It appears that Faktorovich is claiming that the interview here is part of what makes her noteworthy enough to warrant inclusion on wikipedia.
Horrifying (and predicted way back at the beginning of this topic)

>1115 Petroglyph: This topic will still exist if the conversation is moved to another, but it is becoming too slow to load, which would be a greater loss. At some point the conversation will have to move somewhere else.

1122amanda4242
Modificato: Mag 1, 2022, 1:11 pm

"There were 651 comments about my series and findings in a LibraryThing discussion"

OMG. She actually linked to this thread like it would make her look good?! This thread is literally hundreds of posts of people telling her she is wrong and hundreds of posts from her in which she utterly fails to prove anything except her own ignorance of every subject she mentions.

1123lilithcat
Mag 1, 2022, 1:12 pm

>1122 amanda4242:

She seems to think that "comment" means "cited with approval". Not surprising, though, from a person who will "use the term “ghostwriter” in a unique variant from the current dictionary-definition", who doubles down on her confusion of "faze/phase" and "reins/reigns", rejects the definition of "insect" as being an animal, etc.

Apparently, words mean whatever she wants them to mean at any particular moment.

1124Petroglyph
Mag 1, 2022, 1:27 pm

>1118 MrAndrew:
>1120 lilithcat:

I saw, I scrolled, I shared. Others have to suffer along with me.

1121
Sadly, not an uncommon occurrence among the anti-science types. Any semi-mainstream attention is leveraged to promote the woo.

>1122 amanda4242:
>1123 lilithcat:
And her Google Scholar citations, too! Many of which are references to her own work from her self-published "review" blog.

1125faktorovich
Mag 1, 2022, 1:34 pm

>1113 Petroglyph: I do not understand why the image of a bird is derisive. If I could be reborn as any other living creature, a crow is my top choice. So you are plagiarizing somebody else's cartoons, and placing your own insults inside of place-holders. I think a better meme creator should at least draw their own cartoons, so they precisely express the satirical comments one intends to make. I mean, crow is a symbol of death. A rabbit is the more stereotypical symbol of fear. And BRRAM does not re-attribute the Renaissance to my byline, but rather to six other bylines. Satire applies exaggeration to truthful statements; if you make up a false claim and attribute it to me (as I said before) this is just libel and not satire.

The author of this article has buried the description of what they actually did in this experiment in the Results, instead of starting the article by explaining they did not compare Polish to English texts, but rather Polish and English texts separately. Most of the introduction should be cut out and these types of simple explanations need to be placed there. And the title of this article is also inappropriate for the contents, as it is not an experiment about just Woolf, but rather about a group of variedly bylined texts. The introduction and most other parts focus on Woolf and the varied translators, instead of explaining that the experiment was not only considering the Woolf byline. And Figure 2 shows a chaotic outcome, which is also expressed in the conclusion. The RC translator has two novels in different bylines that match each other. And as I concluded in my study, the E. and C. Bronte texts match each other, even when different translators work on them. But then, most of the the other bylines still match when translated in Polish, even when different translators worked on their translations. Given the former contradictory findings, only if the translators' bylines are incorrect because some of them were ghostwriting would it be possible to actually get these results without manipulating data.

Yes, all computational-linguistic methods that work must by-definition involve "language-specific things". For example, there are 211 languages, such as Abipón, that do not have a passive voice construction in their grammar. The rest of the list is here: https://wals.info/feature/107A#2/18.0/148.9 If I was performing tests on one of these languages, I obviously would not have used passive voice as a measure. I did not test for some punctuation marks, or for words-per-paragraph when applying my method to the Renaissance because of transcription glitches in these areas, and the use of many line breaks in plays and poetry. I did use words-per-paragraph and the other measures when testing texts in the 18-20th centuries. If a method is completely insensitive to these types of variations in languages; it is unlikely to work as an accurate attribution method.

The most "language-specific" thing imaginable is the dictionary of words that are specific to that language. So if you apply the same dictionary two different languages, you are getting completely nonsensical results for one of these tests. Even checking the letter-frequency would be more likely to show a bit more similarity between texts in different languages than checking word-frequency, as the latter would logically show near-zero similarity. The article in question avoids this complete stoppage by testing the texts in the different languages separately. However, this does not really check the ability of a translated text to be identified to its original author despite the language change, because it depends on trusting if the translators' bylines are accurate. Thus, it is important to test many more texts by each translator with multiple bylines to check the authenticity of their authorship attributions.

"The second translator is one of the authors of this study". This point raises the level of bias in this article to the maximum possible level. This is similar to trusting "Shakespeare" to establish his own bylines, or trusting Harvey's (a member of the Workshop) annotations about "Hamlet's" authorship, or trusting "Nashe's" (Verstegan's) claim about over 10,000 attendants at a single staging of a drama. Only it's worse because it would be more like if William Percy had written a scholarly article as a co-writer wherein he proved that he was not the author behind "Shakespeare" and that "Shakespeare" was indeed a real person, and humanity had continued to quote Percy on this "fact" for the following 500 years. You cannot study the attribution of your own translation. If you have done any ghostwriting in the translation field, you would obviously be self-interested to confuse the results.

The definition of syntax does not change in Slavic languages. Syntax in Russian is: синтаксис. Definiton: Отдел грамматики, изучающий предложения и способы сочетания слов внутри предложения. Translation: A section of grammar that studies sentences and how words can be combined within a sentence.

Even if the translator who wrote this article submitted all of her drafts to show which words she changed syntactically; there would be no rational relationship between such changes and word-frequency analysis that they actually applied to check for authorship attribution. This is all indeed nonsensical or lacking in any rational beyond double-speak.

1126faktorovich
Mag 1, 2022, 1:48 pm

>1114 Petroglyph: A Wikipedia editor had previously added an entry about my BRRAM findings on the "Shakespeare Candidates" page, but he only added one of the 5 ghostwriters that I credited with different "Shakespeare" texts in my study. I corrected this error by adding lines for the other 4 ghostwriters, so that it would not appear that my findings were only crediting William Percy as "the William Shakespeare". My attribution of this byline to 5 different ghostwriters proves the lack of a real "author" behind this byline, whereas the false claim that I only attribute it to 1 byline reinforces this long-held false belief, and merely shifts who the "real" "Shakespeare" was to a different byline. This was a simply citation error, as the Wikipedia editor had simply failed to actually read the conclusions of BRRAM, and I wanted to correct this misunderstanding. I was not committing a self-citation error because the Wikipedia editor had already cited my study and I was only correcting his misrepresentation of my claims. They ended up correcting some of the citation by changing the website link the Wikipedia editor had before to my Anaphora page to instead citing a Wichita newspaper article about my findings. I stand behind every word I said in this discussion, and I firmly believe Wikipedia should change its rules to be fair to independent researchers, should standardize their definition of "expert", and should reveal the identities of their editors so that their biases are public. You are all welcome to go to these pages to take a look at the discussion, but all except for my own Talk Page have been blocked to all further comments by Wikipedia's editors, so that even if you wanted to comment on these points, you cannot do it on Wikipedia's public pages.

1127paradoxosalpha
Modificato: Mag 1, 2022, 1:49 pm

Boy, I'm glad I had a stranger sitting next to me while I wrote my books and papers so that nobody could ever impugn my "biased" bylines. What author can be trusted to lay claim to his own work? The world of words is an evil racket.

/s

1128Petroglyph
Mag 1, 2022, 2:42 pm

>1125 faktorovich:

"The second translator is one of the authors of this study". "ThE SeCoNd tRaNsLaToR Is oNe oF ThE AuThOrS Of tHiS StUdY". ThIs pOiNt rAiSeS ThE LeVeL Of bIaS In tHiS ArTiClE To tHe mAxImUm pOsSiBlE LeVeL.

It took you until >1113 Petroglyph: to grasp that point, though. You missed it while reviewing the article!! And again when I pointed it out in >1106 Petroglyph:. It had to be presented to you as a single, separate, stand-alone point before you acknowledged it.

And instead of admitting a mistake, you jump straight-away to more shit-flinging on an unrelated topic. More DARVO.

The author of this article has buried the description of what they actually did in this experiment in the Results, instead of starting the article by explaining they did not compare Polish to English texts, but rather Polish and English texts separately

In other words: you, who "reviewed" this article, made up a ridiculous scenario in your head, behaved as though the authors had committed the same stupidity, and wrote a dismissive bit of ignorant babbling that you expect me to take seriously. What the authors actually did is clearly stated in the paper. That exact same bit was quoted in >1106 Petroglyph:, which you commented on in >1112 faktorovich:. I had to explain it again to you (in >1113 Petroglyph:) using small words and overly explicit phrasing. It takes that much hand-holding for you to grasp a central feature of an article you reviewed.

And you still missed it. You missed it twice. You continued to behave as though a confusion that had its origin in your mind was, somehow, the authors' fault. And when someone very carefully takes you by the neck and rubs your face in what you misrepresented, you immediately switch to how it's really the authors' fault again for causing you to misinterpret their title.

I'm reminded of The Narcissist's Prayer:

That didn't happen.
And if it did, it wasn't that bad.
And if it was, that's not a big deal.
And if it is, that's not my fault.
And if it was, I didn't mean it.
And if I did, you deserved it.

I'd like to propose the Faktorovian Prayer:

That's absolutely impossible
And if it's not, it's illogical,
And if it isn't, it's nonsensical,
And if it isn't, then it should have been clearer,
And if it was, it's still erroneous
You're just picking on me

The definition of syntax does not change in Slavic languages
I wasn't talking about the definition of syntax, but about how Polish syntax (and Slavic-language syntax) is, generally, a bit more involved than English.

thIS PoinT rAisEs ThE LevEl oF BiaS IN tHIS ArTiCLE tO THe maXimum poSsIbLE levEl.

1129prosfilaes
Mag 1, 2022, 7:04 pm

>1078 norabelle414: I Survived the Great Vowel Shift is a linguist focused group. There's going to be overlap in interested people, but I think it should be a separate group: Literary Computing (thanks, Keeline). Set up at https://www.librarything.com/ngroups/23692

1130faktorovich
Mag 1, 2022, 8:49 pm

>1128 Petroglyph: If a reviewer of an article cannot "grasp" what the article is about simply from reading the abstract; the fault is entirely with the author who has failed to digest the abstracted meaning of the article for his or her readers. Hiding the central point of the study in a minor note towards the end is a deliberately subversive action designed to prevent comprehension and thus criticism; or alternatively it can be simply the result of a sloppy researcher that does not know an abstract is supposed to summarize the article. The failure to call the article after the main subject in the article is also entirely the fault of the writer, who might have some financial or academic-credit in mind instead of putting the conveyance of precise meaning as the top priority.

You then go on a tirade about how the nonsensical nature of the article is the fault of the readers who fail to understand or to find the meaning in such nonsense; across this tirade you fail to address any of the problems with the article you asked me to review that I carefully explained.

When you start writing prayers or poems about a scholar, that scholar has clearly hit a nerve and has communicated something new to you that has radically shifted your understanding, so that there is something for you to poetize about. Nobody writes poems about that horrid scholar who didn't say anything meaningful.

Statistically speaking, there is a 100% chance that there is academic fraud somewhere in academia because the College Admissions Scandal is one of dozens of legal cases where academic fraud led to convictions in Court. A conviction is how one proves something to be true in Court. A 1% chance would mean there were no cases of convictions for academic court ever anywhere in the world, but there were occasional complaints that were never prosecuted.

1131faktorovich
Mag 1, 2022, 8:51 pm

>1129 prosfilaes: Let me know if you post any questions for me there, and I'll respond there.

1132Crypto-Willobie
Mag 1, 2022, 9:17 pm

Well, I missed 1040 and 1066 and 1111. I'm holding out for 1215.

1133Keeline
Mag 1, 2022, 9:45 pm

>1130 faktorovich:

Statistically speaking, there is a 100% chance that there is academic fraud somewhere in academia because the College Admissions Scandal is one of dozens of legal cases where academic fraud led to convictions in Court. A conviction is how one proves something to be true in Court. A 1% chance would mean there were no cases of convictions for academic court ever anywhere in the world, but there were occasional complaints that were never prosecuted.

That is not at all how statistics works. If you think it does then you should not do any work which relies upon statistics.

One person enters a bank and robs it, was caught and convicted. 10 million people enter banks every day. That does not mean that the 10 million people who entered the bank are all robbers.

The college admissions scandal was largely a crime of parents and consulting companies who promised to get their kids into the best schools possible, sometimes using dishonest representations of the qualifications or backgrounds of the would-be students. This has little or nothing to do with the work by academics who are just as sincere and dedicated to their topics as you believe yourself to be. In most cases they are even more qualified to talk about their topics for having spent decades keeping up to date on the evolution of their field and the theories which help to explore it. Sometimes this work requires real mathematics, statistics, programming, and the ability to realize when they have started work on a project that doesn't quite pan out as they hoped.

You have stated many times in this thread how you are engaged in "translating" some work or another in your corpus. Yet, you have recently stated

It is an ideological dream that "translated text ought to retain a 'taste' of the original language or not". In reality, most translations are extremely unlike the original text they are working from....

So by translating and then measuring the highly-processed tests, are you not introducing the style traits of your own work on the texts? Maybe they all look similar because they have all received your similar processing?

I am still astounded that you find no value in, find it "nonsensical" to consider that you could calibrate your tests on texts where you absolutely know the authorship. These tests should be able to tell right away that a work is or is not by one of these authors for whom you have a baseline (otherwise known as a "control" in the scientific method). If you cannot or will not do this, why should any of your results be taken seriously?

Extraordinary claims require extraordinary proof. It has to be clear and convincing. You can't convince anyone by saying that practically everyone 400+ years ago was unable to write the things attributed to them and that only six people were capable of putting quill to parchment.

Combining 27 tests that are not always relevant to the edited literature of 400 years ago and are stripped down to a binary result based on a percentage that is cherry picked is not some alchemy of authorship attribution. It is hiding the traits, some which may be relevant and some which may not be, under a pile of noise. This is why scientists devise experiments to isolate single measurements to identify the effect of those individual tests and understand what is going on.

Many kids who get chemistry sets at one time or another start to combine chemicals at random or eventually all of them from the set. This is why the manufacturers of these sets have to be careful about what is included. They need to be able to do interesting and educational demonstrations and "experiments" but not include too much that could cause a damaging reaction. Even still, some kids managed to do so, often because they added chemical from beneath the kitchen sink or from the garage. And now it is very hard to find a chemistry set that does much more than allow for the combination of baking soda and vinegar.

I think many of us reading this thread have been hoping for some kind of reason to look deeper into your efforts on this. But since you seem to want to impugn the research and reputation of as many people here and many more who are not here as possible, it is hard to treat this like a peer-to-peer conversation about a body of work.

For my own part you have made snap conclusions (100% incorrect thus far) about works I have mentioned and about which you know nothing and did not even try to find like The White Ribbon Boys of Chester. Instead you used it as a starting point for another topic about writers copying other writers when that has nothing to do with the authorship guess made by me, as an adult, based on similarity to other works by a certain author. This was before extrinsic evidence about the authorship was available and at the time I made the guess, there was no expectation that such information would ever be available. But, when the new information (contractual release and several letters going back and forth between the writer and his employer) did become available, I didn't deny that and say, "no, that couldn't be right." Instead, I did what scientists did and changed my understanding based on new evidence after weighing the value of it.

I have not been working on my field for a couple of years. You stated that you only turned to the English Renaissance two years ago and have been working on this for a few years (not specified). I've been active in my field for more than 33. It is not the way I earn my living because there's no money in writing about it. Sure, I've sold some articles to slick magazines based on my research but this kind of work pays little and is not going to fill a gas tank for very long, let alone pay for a home.

That some accomplished people have tried to engage you in conversation about your work for more than 1,000 messages is something that you should appreciate and take a chance to learn from. Instead you treat us like liars and children. None of us has any stake in whether or not there is anything to your research. We have our own lives and careers. Time spent here could be spent on our own interests and work. As it was I had to step away to make my 27th presentation to the Popular Culture Association national conferences. These don't make me a penny and I have spent thousands of dollars doing the work and attending these conferences around the U.S. (pre-COVID). I do it because I like working in the field and I especially like many of the regular presenters in our group who are not only colleagues but also good friends. This is a part of academics. It is not an adversarial role trying to tear the others to gain foothold for your own work.

Authorship studies interest me because I deal with a field with a lot of pseudonyms. But it would mainly be used to bolster extrinsic evidence. I would not use a stylometric study as the sole evidence. But stylometrics can reinforce a set of circumstances or an educated supposition.

James

1134faktorovich
Modificato: Mag 2, 2022, 11:46 am

>1133 Keeline: This is where paying attention to precise rules of grammar and statistics comes in. The statement was: "If there is only a 1% chance of academic fraud... we have to take it as an absolute certainty." Without a phrase in this statement that clarifies the intended meaning, the present words are all that we can use to grasp it. And the missing phrase you are suggested was implied is, "out of all the legal academic activities that go on". This phrase does not even fit because the statement was not made with it in mind. The statement as-is just states that there is a 1% chance that there is any academic fraud anywhere in the world. If a specific institution or even a specific country was intended, these needed to be inserted for clarification. And there are plenty of statistical studies out there that confirm that academic fraud is far above 1%, such as in this article: https://unicheck.com/blog/academic-cheating-statistics Based on my personal experience grading papers/ tests as a college professor and using TurnItIn to check all papers for plagiarism, I am certain the problem is much worse than this article concludes (largely based on asking students if they cheat). It is much harder for academics to test themselves by considering the degree of academic fraud among professors, but there are some articles/ studies, such as this one: https://www.insidehighered.com/news/2019/10/03/study-academic-dishonesty-extends... I never claimed everybody in academia is fraudulent, but then again I am not conducting a study to determine the precise percentage. Without sharing specific data and how I know what I know to be true beyond articles about it published by others, it is simply a fact that there is a lot of academic fraud around the world and ghostwriting and plagiarism are major contributors to fraud at all levels from students to professors.

"So by translating and then measuring the highly-processed tests, are you not introducing the style traits of your own work on the texts? Maybe they all look similar because they have all received your similar processing?" No, first I measured the texts and reached the attribution conclusions that I posted on GitHub years ago. I have translated 15 books so far, and I have not re-measured the linguistics of the resulting translations. I took the standard steps to clean the files prior to testing, such as removing content by other bylines (editorial introductions, Gutenberg copyrights notices).

"...You could calibrate your tests on texts where you absolutely know the authorship." There is no such thing as a "control" where you "absolutely know the authorship" in unbiased attribution studies. If you accept any byline as an established fact without checking if it is indeed accurate; you are starting your experiment with a false assumption. This is different in most other scientific fields. For example, if you are testing a new chemical compound you designed for a shampoo, your controls might be known compounds that have been proven to achieve hair-cleaning results in the past. It might be a good idea to question if those compounds are indeed the best possible compounds for the purpose, but basically previous studies should have established their usefulness. But given the rates of academic fraud mentioned in the previous articles, it is irrational to believe in any bylines without testing them. Believing all stated bylines, or some bylines while doubting others is like studying the impacts of smoking and starting a study by assuming some brands do not cause cancer, while setting out to test if some brands (those funding your study) do cause cancer.

"You can't convince anyone by saying that practically everyone 400+ years ago was unable to write..." Funny that you say this, as I just translated a section where Verstegan writes something very similar in "Restitution" (after explaining why nobody before him appears to have reached any scientific conclusion regarding where Britons' ancestors were from): "Then, to seek out the reason why this conception would possess so many peoples’ minds, I can find none likelier than the lack of learning in former ages among the inhabitants of these parts of Europe; their Druids themselves did not have any knowledge of letters. So lacking the best means to conserve their true antiquity, they had the greatest cause to become wholly ignorant of their own origins. And some of them afterward, when the Romans came among them, came to get knowledge for the use of letters, and being curious for some way or other to seek out their origin, they might easily have found some supposition to make them fall into the theory of being descended from the Trojans (a concept perhaps that was much furthered upon a delight taken in Virgil’s verses). And some therein might have been glorying and extolling themselves, while others might thereby have been drawn to follow the fashion, and to imitate them in such a vainglorious conception, and for the fortification thereof might have sought forthwith to interpret the names of their cities (if in sound they had any nearness to anything concerning Troy) to have consequently been founded by the Trojans." I explain the details in the annotations, but Verstegan is really criticizing the illiterate people of his own age (as he also explains in the following sections), and is confessing why a small Workshop of working professional writers monopolized publishing in Britain. There are 17+ volumes in BRRAM, and each page presents evidence. You keep expecting the information from these volumes to be communicated to you without you having to read them; this is just not how knowledge accumulation works.

"This is why scientists devise experiments to isolate single measurements to identify the effect of those individual tests and understand what is going on." This is precisely what my method does. It provides the exact isolated data for each of the separate 27 quantitative and other more complex tests, alongside with the data for the combined attribution results of these tests put together. In contrast, other methods hide their raw data and the details of the isolated tests (specific words tested/ specific text-to-text comparison scores) in favor of simple graphs that appear to prove their hypothesis without any detailed data for readers to actually check if this is the case.

I do not "want to impugn the research and reputation of as many people". I do have to post reviews or my analysis of the problems of any research that is discussed in this thread, as it is about my series, and I would appear to be approving faulty findings by others if they were posted here and I did not comment on them at all. And in fact, in most cases, posters insist that I must submit a review of the articles being proposed as more accurate than my own method. Then, I simply conduct a frank review and explain why their method does not actually work when I have tested elements of it. I have only proposed a CV analysis of a single case study of a male professor who won a job I had applied for back in 2016-7; this professor invited me to conduct further research by volunteering additional information to what is publicly available. You guys were suggesting my conclusions regarding academia bias lack a basis in reality, and I presented this case study as an example to ground my hypothesis. Since the research methods of all previous computational-linguists I have reviewed (and I have reviewed hundreds of studies) have been faulty, I have nothing positive to report about them. As I do not lie, I have to say when there are problems with others' methods, especially when I am pushed to review them. It would be illogical for anybody to reach conclusions about my research being undeserving of study, merely because I have found problems with others' research.

I have been working 14-hours per day every day for over 2 1/2 years now on this BRRAM series with absolutely no funding. And really I have been working at this schedule on my research since 2005, or for 17 years. I am now going to return to translating "Restitution for Decayed Intelligence" (the title reinforces the idea of general illiteracy or "decayed intelligence"; he does not say, "increased intelligence", which he would have if Verstegan's statement about the growth in literacy quoted earlier is to be taken at face-value) for the rest of the day, and will return in the evening to respond to any questions you guys post for me here.

1135Petroglyph
Mag 2, 2022, 11:52 am

>1133 Keeline:

As per >1104 faktorovich: (and many other places in the thread -- and in her book, and in that interview) she is counting only the hits and ignoring the misses.

If a test is counted when it returns a 1, its result should be relevant when it returns a 0. But she only pays attention to the ones.

Her whole testing procedure is one giant exercise in confirmation bias: counting test results only when they indicate similarity, and not when they indicate not-similarity. When up to 63% of her tests (17/27) say "not similar", she chooses to ignore that and goes with the 37% that say "similar" (10/27). Sometimes even 8/27 similar (~30%) is enough. She is not considering all of the results that her tests give her -- she makes the choice to only consider the 37% that indicate similarity, i.e. the kind of result she is looking for. This is the definition of bias.

Her results are so different from anyone else because she is treating random fluctuations in her biased results as the signal ("clusters in the data"). Well, part of the reason. But it's the part that invalidates the entire practical approach.

"A match for any two texts is a roll of the dice that is extremely unlikely if results are random"
And she is making sure that on every duplicate tab in her spreadsheet (one per text), for every test she looks at, she is marking ~18% of all texts as similar. Not the values that fall within ~9% above or below the value of her target text, but 9% of her corpus above and below the target text.

Her estimation of what is "extremely unlikely" is way, way off. And by marking ~18% of her entire corpus as "similar" for every single test and on every single tab, she is massively, massively inflating the "positives", and she is all but ensuring that the procedure will turn up many false positives. The chances of a match for any two texts is so much higher because of the way she is counting things.

It seems like she thinks that there exists something like "the 18% similarity vs 82% divergence rule" (>124 faktorovich:) that she apparently feels she has to impose on her data¹.

This is exactly the kind of thinking of someone who unironically says things like this:

"The more I apply computational-linguistic tests to texts and research these authorship mysteries, the more convinced I am that there are no coincidental matches; all matches indicate shared authorship." (>305 faktorovich:)

"I have not made a single false statement in this thread, and I don't believe I have ever made a false statement in my life. (>1050 faktorovich:)

¹ I don't know what that is. Some weird variant of the Pareto principle?

1136Petroglyph
Modificato: Mag 2, 2022, 12:01 pm

>1130 faktorovich:

iF a rEvIeWer OF aN ArticLE caNNot "GRASp" wHat thE aRtiCle iS AbOuT siMply fRom reAdiNG tHe AbSTrACT

Statistically speaking, there is a 100% chance that there is academic fraud somewhere in academia

1137faktorovich
Mag 2, 2022, 9:04 pm

>1135 Petroglyph: It is absolutely untrue that I only consider the 1's and not the 0's. In a binary calculation, it is impossible to only consider one side of a binary. The number of matches between texts only makes sense when it is compared to the number non-matches for those same texts; the frequency of matches to a given authorial-signature, and the absence of matches to the other authorial-signatures in the corpus is what solidifies the attribution decision.

You keep repeating the few cases where two texts have lower levels of similarity, without confirming that you understood my explanation regarding co-authorship meaning the similarity to each individual author must be lower for the attribution data to be accurate. If there are 3 different co-authors, and your data shows that each or one of them has a 90% match to a given text; this is a nonsensical finding that is a statistical impossibility (90 * 3 = 270% or too much greater than the ideal result of 100% for the total percentage of linguistic styles within a text).

"And she is making sure that on every duplicate tab in her spreadsheet (one per text), for every test she looks at, she is marking ~18% of all texts as similar." This is the first step in a multi-step method, and you are describing it as if it the only step. Did you stop reading after the first step? The 18% that are chosen as similar in the first step are not automatically decided to be matches; it just means a single 1 is added that indicates a similarity on one of the tests between 2 texts. Then, the next step is measuring similarity on the other 26 tests and adding up how many of the tests they had in common. Each test checks something very different, and so most texts have some of these tests showing divergence, while others are within the similarity range. Thus, the 18% cut-off is just a challenging measure for texts to meet when there are 284 different texts in a corpus. If you don't understand my method, try putting your objection in the form of a question. I have no idea how you can mistake a measure of minimum similarity, with all texts being marked as similar. What you are stating is just the opposite of true - "'similar' for every single test and on every single tab" - no, I do not mark 1's in every single tab; that is a ridiculously untrue accusation. There is no relationship between the "Pareto principle" ("for many outcomes, roughly 80% of consequences come from 20% of causes") because there are no "causes" or "outcomes" in my method; so this is another case of you using convoluted words to say things that are nonsensical.

1138faktorovich
Mag 2, 2022, 9:09 pm

>1136 Petroglyph: The quality of "Scholarly Literature" is partially judged by its ability to present information in a manner that is at-minimum comprehendible to a scholar who has published 2 scholarly books. Otherwise a scholar might as well write an abstract that is made up of completely nonsensical words, and then blame all readers who fail to understand any of these made-up or improperly ordered words as suffering from "reading incomprehension".

I have not manipulated any data in my discussion about "Academia", as I have simply cited articles that present the exact data without even summarizing it. You are the one who is manipulating data when you state that there is only specifically 1% of academic fraud in academia, without citing any sources for this fictitious conclusion.

1139Petroglyph
Modificato: Mag 3, 2022, 12:35 pm

>1137 faktorovich:
I wrote: "on every duplicate tab in her spreadsheet (one per text), for every test she looks at, she is marking ~18% of all texts as similar." (emphasis added)

In your writing, that becomes: "This is the first step {...} Did you stop reading after the first step? {...}Then, the next step is measuring similarity on the other 26 tests."

Your reading comprehension stinks. You make unwarranted leaps to wrongheaded conclusions in your head, and go on to pretend that the other party did so. It's called blame-shifting and straw-manning.

"I have no idea how you can mistake a measure of minimum similarity, with all texts being marked as similar. "

It's because you jump to hasty, unwarranted conclusions. The mistakes are coming from inside your own head. read my comment again, but more slowly. And more carefully.

no, I do not mark 1's in every single tab; that is a ridiculously untrue accusation

It's because you jump to hasty, unwarranted conclusions. The mistakes are coming from inside your own head.

I did not say that you mark a 1 on every single tab once the first test on the first tab says 1. I said it is a mistake to mark ~18% of all the texts in your corpus as similar, instead of all the texts within ~9 of each side of the target text's value, and that you are repeating that mistake on all 27 tests, and that you are repeating that mistake on every tab: "on every duplicate tab in her spreadsheet (one per text), for every test she looks at, she is marking ~18% of all texts as similar."

Or, you know:

>1138 faktorovich:

Yeah. You don't understand memes. That's ok. I'll understand them for the both of us.

you state that there is only specifically 1% of academic fraud in academia, without citing any sources for this fictitious conclusion

It's a joke. All these images are jokes. I'm ridiculing your grasp of statistics along with your obsession with claiming data manipulation in academia.

Also, that image said "if there's only a 1% chance of academic fraud"; in your writing, that becomes "you state that there is only specifically 1% of academic fraud in academia".

1140anglemark
Mag 3, 2022, 9:24 am

>1137 faktorovich: If there are 3 different co-authors, and your data shows that each or one of them has a 90% match to a given text; this is a nonsensical finding that is a statistical impossibility (90 * 3 = 270% or too much greater than the ideal result of 100% for the total percentage of linguistic styles within a text).

Faktorovich, is your claim that if three authors have collaborated on a text, it is impossible for each of them to have an individual 90% "match" to the text?

Is your claim that if you have three authors collaborating on a text, and your "tests" show that
* one of them has a 50% "match" in terms of authorial signature
* another one has a 40% "match"
...it will then be impossible for the third one to have a "match" greater than 10%?

Please just respond "yes" or "no" to each of the two questions above. Thank you.

-Linnéa

1141faktorovich
Mag 3, 2022, 12:26 pm

>1139 Petroglyph: "...it is a mistake to mark ~18% of all the texts in your corpus as similar, instead of all the texts within ~9 of each side of the target text's value". I do not make this "mistake", as I do indeed mark "~9 of each side" as similar. This is exactly what my procedure is, as you should have figured out if you read any of the chapter that describes it. The term "similar" does not mean that I am attributing 18% of all texts on each test to the same byline, but rather it is a simpler marker of proximity; these markers combined on 27 different texts establish the overall groupings of the texts into authorial signatures. All attribution methods must have a measure of proximity or at what level texts can be judged to be similar to each other. An 18% cut-off point is very low, since it marks 82% as being dissimilar, and so only texts with a high degree of similarity are given a point. If you run a word-frequency test, you have to know what percentage of similar word usage establishes texts as similar. The raw data set generated by Stylo gives percentage points such as .86/.66 etc. for the different compared words between 2 texts. The researcher has to establish if .66 is a non-match. The manner in which you set up your experiment will determine if these percentages are higher or lower. For example, if I had set the per-test cut-off at 30% instead of 18%, similarity between texts would approach 27/27 tests or I would end up with comparative similarity scores of near 100% or .1. But this broader definition of similarity would fail to register the intricate or slight similarities that my approach registers even for mere editors or those who contribute a small section to a text, and not only the dominant writer and co-writer. I have applied my method to many different corpuses and have fine-tuned the method to maximize attribution precision, and not if my approach sounds cool or marketable to those who do not understand the statistical explanation for it that I offer in Volumes 1-2.

You have to evolve your capacity to write jokes. Lies and insults are not funny to anybody but psychopaths.

For example, I have driven from one coast of America to the other over a dozen times, and I do not lose control of my car or go into a skid while turning, even when doing so suddenly.

You are using "straw man" arguments, as you repeatedly avoid discussing any of the complex points I raise, in favor of personal insults, nonsensical digressions, and general outbursts of rage.

1142paradoxosalpha
Mag 3, 2022, 12:53 pm

"For example, I have driven from one coast of America to the other over a dozen times, and I do not lose control of my car or go into a skid while turning, even when doing so suddenly."

Petroglyph writes the jokes, but Faktorovich rushes in with the punch lines!

1143faktorovich
Mag 3, 2022, 1:17 pm

>1140 anglemark: I have already answered that the total percentage match should not be over 100%, so if there are two co-authors, both combined should add up to 100%, perhaps at 50/50, or 40/60. But such precise outputs are impossible in an actual corpus with 284 different texts. One exception for why it might be possible for both authors to have a 90% match between their texts and a text being tested is if these authors have written several texts together with similar percentage contributions in each of the texts they collaborated in, or let's say a 60/40 split, thus all texts they collaborated in at this approximate measure of collaboration can have a 90% similarity. This is why it is important to evaluate all of the data across a corpus and across not only the 27 quantitative tests, but also non-quantitative tests like 3-word-phrases and structural patterns when analyzing what the quantitative conclusion means about the authorship attribution for a specific text. As for your second question, again, I am not setting any texts or authors as pre-determined, so it is not as simple as a 100% max, with no possibility of over 10% for the third author. I refer to as a "match" primarily the individual test scores, and the combined scores out of 27 tests. But I base the final attribution decision on how many texts within a given signature a text matches. A few examples are needed to explain it. Let's start with the "Lyly"-assigned and Percy-ghostwritten play "Sapho and Phao" (1584); this was one of Percy's first plays that he wrote while in college; it only matches other Percy-ghostwritten texts at 10 or more tests (without any matches to any of the other 5 ghostwriters at 10 or more tests; it is such a strong match it even also matches some Modern English translations of Percy's "Shakespeare" plays). Yet, the strongest matches out of the 16 other Percy texts it matches at 10+ tests are at only 13-tests. These top 13-test matches are the "Munday"-assigned "Fidele and Fortunio" (1585) and the "Shakespeare"-assigned "Pericles", Acts 3-5 (1609). Percy is most likely to have written plays entirely independently of a co-author at the start of his career 1584-5, whereas later in his career he frequently co-wrote with Jonson, as they split acts or scenes to speed-write plays. The data for "Sapho" begins to build this narrative for who wrote this text and how, which is strengthened when it is checked against other pieces of evidence. In contrast, we can compare this outcome to a couple of collaborative projects between Percy and Jonson. "Cavendish's" "Variety" (1649) 10 of Jonson's texts match it, and 7 of Percy's texts; there are no matches with any of the other ghostwriters at 10 or more tests; there is only 1 match at 13-tests to the co-bylined "Chapman" and Jonson "Eastward Hoe" (1605), which primarily matches "Percy", but also has a few matches to "Jonson"; thus, "Variety" and "Eastward" have the strongest match because they were both co-written in similar percentages between Percy and Jonson. A 13-test match is 48% of the total 27-tests, but with matches to both Jonson and Percy for some of these texts at up to 12-tests, that's 44% + 44% = 88%. In another example, "Chapman's" "Sir Gyles Goosecappe" (1606) has a single 10-tests match to Harvey, as well as 16 matches to Percy and 4 matches to Jonson; it has 4 matches at 14-16-tests (51-59%); the highest of these matches at 16 is to "Cavendish's" "Country Captain", which has 4 matches to Jonson, and 24 matches to Percy; as you can see there is a very similar split between Jonson and Percy in both "Country" and "Gyles" and this is why they have a uniquely strong match to each other, even when they have weaker matches to texts that Percy or Jonson wrote more independently from each other.

No previous computational-linguistic study has tested a corpus of 284 texts from the Renaissance with 104 different bylines. Instead they have looked at small groups of texts or bylines that can make it seem that bylines are correct as-is, or that anonymous texts can be assigned to a specific byline that matches it. But when one zooms out and considers the at least 220 plays Percy confessed to ghostwriting under different bylines at different degrees of cooperation with Jonson (and occasionally some help from others), testing so many very similar texts against each other produces so many similarity matches that they all appear to be further away than they really are on a spectrum of very proximate results. Previous computational linguists are likely to have noticed these high rates of similarity, and decided to avoid testing a diverse corpus to avoid reaching a conclusion that contradicts nearly all of the existing bylines. My data is very consistent in its results, as I explain in Volumes 1-2. Feel free to ask me more specific questions, and I will explain further.

1144paradoxosalpha
Modificato: Mag 3, 2022, 1:30 pm

OK, this is so funny I can't let it go.

"I do not lose control of my car or go into a skid while turning ..."

That's not what I see in the image anyway. What I see is that the blue car might easily have driven forward down the many lanes of honest discussion, but belatedly glimpsing the possibility of a turnoff into Strawmanning, cut across the painted separator to reach the desired ramp.

1145Petroglyph
Modificato: Mag 3, 2022, 2:14 pm

>1141 faktorovich:

Petroglyph wrote: "it is a mistake to mark ~18% of all the texts in your corpus as similar, instead of all the texts within ~9 of each side of the target text's value".

Faktorovich misinterpreted that as: "I do not make this "mistake", as I do indeed mark "~9 of each side" as similar. This is exactly what my procedure is, as you should have figured out if you read any of the chapter that describes it."

Alright, pictures it is.

I had R generate 30 random numbers between 1 and 99 and wrote them into a .csv file:

I then pulled these numbers into a spreadsheet, arranged them from smallest to largest, and copied them over into three columns. Let's pretend these are the results of three tests for thirty texts. Here is that table (right-click to embiggen):

For each column I picked a number and bolded it. These are the "target values" -- let's pretend this is the text we're comparing the others to. (I put the same data in three columns so that my three illustrations wouldn't accidentally overlap visually.)

Then, in yellow highlighter, I coloured all the values that are within ~9% of the "target value", above and below, for a total range of 18%. In other words, I looked at the contents of the bolded cell, and highlighted in yellow the adjacent cells that are ~9% higher and ~9% lower than the contents of the bolded cell. I did this for each column.

These are the "matches" as Petroglyph would count them.

Then, in a red font, I coloured all those values that are within ~9% of either side of the target value as measured by the size of the corpus. 9% of 30 is 2.7, so I rounded up to 3. For each column, I coloured three cells above the target value, and three below because that is what ~9% of the corpus size is.

These are the "matches" as Faktorovich counts them.

Petroglyph has understood very, very well that Faktorovich does it the red-colour way. Petroglyph is trying to tell Faktorovich that the yellow highlighting way is the better one. (Reread my quote at the start of this post again, and see if it's clearer now.)

It is clear from the non-overlapping areas in every column that the yellow highlighing and the red colouring are measuring two different things: if you replace the yellow values with "1" and the rest of the column with "0", you'll mark a bunch of red cells as "0". On the other hand, if you were to replace the red cells with "1" and the rest of the column with "0", you'd be marking a bunch of yellow cells "0". You'll get partially different results depending on which method you choose for scoring "similarities".

The yellow highlight way would be a justifiable approach: the reason why two texts are scored as "similar" for a given test is based mainly on the value inside the cell. Data is the leading principle. This means, however, that you'll often end up with an unequal number of "matches" above and below. But that's fine: it's the values inside the cells that are important.

The red colour way is just bonkers: the reason why certain texts are scored as "similar" for a given test is the size of the corpus. For a corpus of 100 texts, Faktorovich would systematically score 18% of the texts as "similar" in each of the 27 tests, regardless of how close (or how far apart) the values inside the cells actually are. For a corpus of 284 texts, that would mean that she would score 51 texts (~25 on either side) as "similar", regardless of the actual values inside the cells.

Faktorovich does it the red-colour way. For every column/test, on every tab of her spreadsheet, she consistently marks ~9% + ~9% = ~18% of the texts in her corpus. (And she still has to work with ~13/27 similarity ratings!!!)

I'm not sure what Faktorovich thinks: does she think she's doing things the yellow-highlight way while in reality going about it the red-colour way? Or does she genuinely think that the red-colour way is correct? Either way: her scoring system is bonkers.

Now, the random numbers I had R generate for me were evenly distributed (which means that the yellow and the red "matches" aren't all that far apart). But real-life test results aren't always so well-behaved. So let's take a look at some of the actual cell values that Faktorovich bases her similarity on. Here's a screenshot from the spreadsheet she uploaded as part of her own Lunch Break Experiment (tm):

I added the rows at the bottom that are highlighted in green. Those rows contain, for each column, the minimum value in that column; the maximum value; and the difference between them -- how large of a jump is there between max and min?

You can tell straightaway that Faktorovich is working with some very narrow ranges: in the columns for Adjectives, Verbs, Adverbs and Prepositions and Pronouns, for example, the difference between the min and the max value is ~2-4%.

The yellow-highlighting way would end up marking all of these as similar, whereas the red-colour way wouldn't. Oh. my. god. This is why Faktorovich does things the red-colour way, isn't it?

Question for Faktorovich: Which way do you measure similarity? The yellow highlighter way, or the red font colour way? Or if there's another way: which is it, and why have you chosen it?

1146Petroglyph
Mag 3, 2022, 2:17 pm

>1142 paradoxosalpha:

That was very on-brand of her, indeed!

1147Petroglyph
Mag 3, 2022, 2:19 pm

Questions for Faktorovich:

Can you show me Verstegan's authorial signature? And Percy's? And Jonson's? All six of them, in fact.

For the Lunch Break Experiment (tm) in >290 faktorovich:, what are the authorial signatures of Jane Austen, and Marie Corelli and all the others?

1148amanda4242
Modificato: Mag 3, 2022, 2:24 pm

I've continued this in a new thread. My apologies to those who wanted to see how long this thread could get, but it was just taking far too long to load.

1149Petroglyph
Mag 3, 2022, 2:41 pm

>1137 faktorovich:

In a binary calculation, it is impossible to only consider one side of a binary.

And yet you are managing it. You are managing it by treating both halves of the binary differently: in considering one half of the binary in one way, and the other half in a very different way, what should have been a binary is now something else: a mistake. It is only "impossible to only consider one side of a binary" if no-one is making any mistakes.

The number of matches between texts only makes sense when it is compared to the number non-matches for those same texts; the frequency of matches to a given authorial-signature, and the absence of matches to the other authorial-signatures in the corpus is what solidifies the attribution decision.

I know that is how you are treating your results. And it's a very basic error. It's something that I've been trying to explain to you. In addition to yellow highlighting and all that. This is a separate problem from that.

The problem is that you are treating the outcome of your test results differently depending on whether a 1 or a 0 is returned. If you interpret the ones as matching a "given authorial-signature", then you have to count the zeroes as not matching that same author signature.

Your tests comparing two texts are attempts at answering the question "are these two texts by the same author?" and the answer is returned in a series of ones and zeroes. The ones indicate matches to this author; the zeroes indicate non-matches to that same author.

When you compare the matches and non-matches between any two texts, ones and zeroes both answer the same question. When your tests for one text return a 13/27 similarity score to a second text and, therefore, 14/27 dissimilarity score, that result means "over half of my tests say that these two texts are not by the same author". The 14/27 zeroes aren't just "absence of matches" for an entire group of "other authorial-signatures" in your corpus, they are evidence against these two texts being by the same author.

"Is this text by any other author in this corpus?" is a different question, with different answers. You can't interpret the ones as answering one question, and the zeroes as answering another. That is an error. It leads to confirmation bias.

My point is that you are counting the ones and the zeroes differently. You're counting the ones as "evidence" for a single author signature, and when you consider the zeroes, you are counting them as lack of evidence for something else entirely. You switch interpretive frameworks between the ones and the zeroes. That is a mistake.

This is what I mean when I say you only count the ones: the ones are interpreted as evidence for authorship attribution to a specific author; they are treated as evidence backing up the outcome you want. The zeroes are explained away: they are interpreted in such a way that they become irrelevant for the question that both ones and zeroes should be answering.

This is how confirmation bias works: Evidence for the desired outcome is treated more favourably than evidence against.

Any conclusions you've arrived at on the basis of this style of counting are null and void.

1150faktorovich
Mag 3, 2022, 8:46 pm

>1144 paradoxosalpha: If that is the intended meaning, you just have to substitute "Faktorovich" with pretty much anybody else's name in this discussion. Either way, whoever chooses to rely on "strawmanning" instead of "honest discussion" probably would not have made a sudden decision to do so, but rather would have this as their standard method of argument.

1151faktorovich
Mag 3, 2022, 9:26 pm

>1145 Petroglyph: So you are saying that you are counting as similar texts that are within specifically 18% on the two sides of the "target value" (where you can have any number of matching texts depending on which have values that are in this percentage range), whereas I count 18% of all texts (without regard for what specific values these have). Your method can only work in a corpus with a single test-type, such as word-frequency, as in this method you might be able to choose a specific percentage that would make sense. Because I am combining the outputs of 27 different tests, I cannot use any standard percentage cut-off for all tests. For example, the exclamations test is frequently near-zero or zero for a significant portion of texts. But there are some that have high exclamation values. If you are measuring 18% of the highest value in the range, you might judge not only all 0's, but also most of the other exclamation values to be similar to each other or a 1 (if a text falls in this group), whereas only a few outliers near your highest-value would be dissimilar or 0. In contrast, my method can be applied consistently to all the different tests without creating absurd or irrational outputs. (Another example is my test for the average number of characters-per-word which tends to fall into a pretty narrow percentage range of each other, but has results that have tiny differences between texts; thus, if you choose an 18% range from one volume you might have only a couple of matches, while from another volume, you might have most of the texts in the corpus. Either one of these outputs would pretty much cancel the significance of running this specific test, as a match to almost all, or to almost none is nonsensical in a corpus where there are dozens of plays from bylines like "Shakespeare", or an expectation of multiple matches between texts.) You would understand why what you are proposing is impractical if you had tested several different corpuses with 27 different tests and not by merely feeding texts into Stylo and relying on the output it generates. By looking closely at the results, I have adjusted my method to maximize its accuracy.

Then, you have an epiphany that I have "some very narrow ranges" in some of the 27 tests, such as Adjectives, where it is a range of 2 points between the minimum and maximum. If I measured Adjectives by calculating 18% from any value at such a close curve, it would increase the number of matches for each test, and it would dramatically increase the percentage of tests on which texts would match. Just as the exclamations are clustered around 0, Adjectives also have a clustering point around which most of the texts are within 18% of each other. So by following your method, I would be engaging a trick that would make it seem as if a lot more texts are similar to each other. But the problem with this approach is that while the total "matches" would go up, the number of false-positives would also increase. My goal is not to see the highest possible percentile match, but rather the most accurate possible results, and this is achieved by just counting a percentage of texts and not the percentage between their numeric outputs.

Consider this experiment, you are trying to figure out which out of a group of children are most likely to be the sons of which fathers. You don't have access to DNA testing, so you are sorting them by obvious traits like hair color and eye color etc. You create a table for each of these tests. Let's look at the test for eye color, you have chosen to register specific colors instead of just sorting the kids into blue, brown etc. categories. Perhaps you happen to have a pool of children that mostly have variants of the brown eye color. If you use the 18% measure as a cut-off while placing children on the rainbow spectrum, you might have up to 95% kids falling closer to the brown color than to the other colors. You have thus made this test statistically useless because it is not registering the variations between brown that your careful photographing of eye color and separation of it on the full spectrum was designed to provide. If you separate the children on this spectrum and then choose the 18% of children that are closest in eye color to a tested father; then, you will receive the specific answer to the question which of the children are similar in eye-color to each of the potential fathers. Do you have any questions about this?

1152faktorovich
Mag 3, 2022, 9:30 pm

>1147 Petroglyph: The authorial signatures for each of these authors are registered in the data output in the tables, and diagrams I have previously provided. All of this data put together is a story in numbers about the type of authorial signatures these authors have. I narrate what the six Renaissance signatures signify across the BRRAM series, which you requested a copy of, so you should have those lengthy chapters of explanation. Can you clarify what you are trying to ask here, if you are not asking me where to find my data?

1153amanda4242
Modificato: Mag 3, 2022, 9:47 pm

Just a reminder that this thread has been continued. https://www.librarything.com/topic/341561#

To link to specific posts from this thread click "more" in the post you want to link to, click "link," and copy and paste the url into your post on the new thread.

1154faktorovich
Mag 3, 2022, 9:56 pm

>1149 Petroglyph: You are just digressing into nonsense again. "If you interpret the ones as matching a "given authorial-signature", then you have to count the zeroes as not matching that same author signature." This is exactly what I am doing: I am counting the 1's as a match, and 0's as a non-match. There is no other option. Only if a researcher is manipulating or inserting fictitious data would it be possible in a binary system to do otherwise than to count 1's as a positive, and 0's as a negative outcome.

An under-10 match does not mean two texts are not by the same author, it just means that there are over 18% of texts in the corpus that are more alike than these particular two texts under review. In a corpus of 284 texts, there are 74 texts in the Percy group, when 18% of 284 is 51; but there are also 29 Modern English texts that occasionally match Percy or Jonson's signature, but mostly are matches to each other; and there are also many cooperative texts written by Percy and Jonson beyond these. This skewing of the corpus towards Percy (and to a lesser degree Jonson) is the result of "Shakespeare", "Lyly" and other Renaissance dramatists being of greater interest to researchers, and thus their works had to be analyzed in this study. Thus, when applying the simple 18% measure on each test, and given that over 36% of the corpus is similar to each other (or to Percy's style); it would be impossible for this 36% to fit within the 18% similarity cut-off-point (in either type of cut-off, be it for 18% of texts, or within 18% of the compared against value). There are still matches between all of these different Percy texts, but they are not all registered for each individual text-to-text comparison. Take a look at this diagram on GitHub: https://github.com/faktorovich/Attribution/blob/master/Diagram%20-%20Percy.jpeg It shows the intersecting similarities between all of the texts in this group, even if checking down a text-to-text match might not register all of these for each text. In most cases there are 7-9 matches for the majority of these texts that do not fit in the 18%. When I tested the 18th century corpus, most of the texts were very high up to 95% matches to other texts by the same author; this was because there was no cooperative writing between most of these authors, and most of the bylines were accurate with minimum multi-byline ghostwriting. I am going to return to re-examining a larger corpus from the 18th century after I finish BRRAM by January 2023 or so. The British Renaissance's data is indeed extremely unusual, as it represents only six ghostwriters who frequently collaborated or even plagiarized each other and their previous work. Without testing these same 284 texts with a method of your own choosing and posting your full raw/ processed data, you cannot argue that there is any truth behind the current byline attributions to these texts, when my raw data clearly disproves these claims as blatantly untrue.

1155Crypto-Willobie
Mag 5, 2022, 9:59 am

1155!

1156conceptDawg
Modificato: Feb 14, 5:25 pm

Questo messaggio è stato cancellato dall'autore.

Questa conversazione è stata continuata da Who Really Wrote the Works of the British Renaissance? thread 2.

Who Really Wrote the Works of the British Renaissance?

ConversazioniTalk about LibraryThing

1AbigailAdams26Modificato: Dic 3, 2021, 9:44 pm

2thoroldDic 4, 2021, 6:43 am

3Nicole_VanKDic 4, 2021, 7:51 am

4MarthaJeanneDic 4, 2021, 8:45 am

5curiousstr.eam7 Dic 4, 2021, 8:46 am

6Crypto-WillobieDic 4, 2021, 9:10 am

7aspiritModificato: Dic 4, 2021, 9:32 am

8lilithcatDic 4, 2021, 9:36 am

9DuncanHillDic 4, 2021, 10:37 am

10anglemarkDic 4, 2021, 10:58 am

11PetroglyphModificato: Dic 4, 2021, 11:13 am

12MarthaJeanneDic 4, 2021, 11:15 am

13amanda4242Modificato: Dic 4, 2021, 11:35 am

14Crypto-WillobieDic 4, 2021, 11:36 am

15abbottthomasDic 4, 2021, 11:47 am

162wonderYDic 4, 2021, 11:47 am

17norabelle414Dic 4, 2021, 12:04 pm

18anglemarkDic 4, 2021, 12:04 pm

19faktorovichDic 4, 2021, 1:03 pm

20faktorovichDic 4, 2021, 1:09 pm

21faktorovichDic 4, 2021, 1:11 pm

22faktorovichDic 4, 2021, 1:14 pm

23faktorovichDic 4, 2021, 1:16 pm

24paradoxosalphaModificato: Dic 4, 2021, 1:19 pm

25faktorovichDic 4, 2021, 1:40 pm

26faktorovichDic 4, 2021, 1:48 pm

27paradoxosalphaDic 4, 2021, 1:53 pm

28lilithcatDic 4, 2021, 2:02 pm

29faktorovichDic 4, 2021, 2:09 pm

30faktorovichDic 4, 2021, 2:11 pm

31faktorovichDic 4, 2021, 2:13 pm

32faktorovichDic 4, 2021, 2:18 pm

33DuncanHillModificato: Dic 4, 2021, 2:23 pm

34faktorovichDic 4, 2021, 2:21 pm

35Taphophile13Dic 4, 2021, 2:27 pm

36faktorovichDic 4, 2021, 2:28 pm

37MarthaJeanneModificato: Dic 4, 2021, 2:45 pm

38faktorovichDic 4, 2021, 2:47 pm

39faktorovichDic 4, 2021, 2:50 pm

40faktorovichDic 4, 2021, 2:53 pm

41faktorovichDic 4, 2021, 2:55 pm

42lilithcatDic 4, 2021, 2:57 pm

43MarthaJeanneModificato: Dic 4, 2021, 3:01 pm

44lilithcatDic 4, 2021, 3:00 pm

45faktorovichDic 4, 2021, 3:00 pm

46faktorovichDic 4, 2021, 3:03 pm

47DuncanHillDic 4, 2021, 3:05 pm

48Taphophile13Dic 4, 2021, 3:06 pm

49faktorovichDic 4, 2021, 3:06 pm

50faktorovichDic 4, 2021, 3:08 pm

51faktorovichDic 4, 2021, 3:13 pm

52MarthaJeanneDic 4, 2021, 3:14 pm

53faktorovichDic 4, 2021, 3:15 pm

54faktorovichDic 4, 2021, 3:17 pm

55andylDic 4, 2021, 3:18 pm

56andylDic 4, 2021, 3:19 pm

57Taphophile13Modificato: Dic 4, 2021, 3:22 pm

58faktorovichDic 4, 2021, 3:23 pm

59aspiritModificato: Dic 4, 2021, 3:30 pm

60faktorovichDic 4, 2021, 3:24 pm

61Taphophile13Dic 4, 2021, 3:27 pm

62faktorovichDic 4, 2021, 3:29 pm

63faktorovichDic 4, 2021, 3:31 pm

64thoroldModificato: Dic 4, 2021, 3:56 pm

65AbigailAdams26Modificato: Dic 4, 2021, 4:18 pm

66KeelineDic 4, 2021, 4:05 pm

67faktorovichDic 4, 2021, 4:08 pm

68faktorovichDic 4, 2021, 4:20 pm

69faktorovichDic 4, 2021, 4:25 pm

70paradoxosalphaModificato: Dic 4, 2021, 5:46 pm

71aspiritDic 4, 2021, 6:09 pm

72PetroglyphDic 4, 2021, 7:14 pm

73norabelle414Modificato: Dic 4, 2021, 7:58 pm

74faktorovichDic 4, 2021, 9:01 pm

75lilithcatDic 4, 2021, 9:40 pm

76paradoxosalphaDic 4, 2021, 10:25 pm

77Crypto-WillobieDic 5, 2021, 12:20 am

78SandraArdnasDic 5, 2021, 4:32 am

1AbigailAdams26
Modificato: Dic 3, 2021, 9:44 pm

2thorold
Dic 4, 2021, 6:43 am

3Nicole_VanK
Dic 4, 2021, 7:51 am

4MarthaJeanne
Dic 4, 2021, 8:45 am

5curiousstr.eam7
Dic 4, 2021, 8:46 am

6Crypto-Willobie
Dic 4, 2021, 9:10 am

7aspirit
Modificato: Dic 4, 2021, 9:32 am

8lilithcat
Dic 4, 2021, 9:36 am

9DuncanHill
Dic 4, 2021, 10:37 am

10anglemark
Dic 4, 2021, 10:58 am

11Petroglyph
Modificato: Dic 4, 2021, 11:13 am

12MarthaJeanne
Dic 4, 2021, 11:15 am

13amanda4242
Modificato: Dic 4, 2021, 11:35 am

14Crypto-Willobie
Dic 4, 2021, 11:36 am

15abbottthomas
Dic 4, 2021, 11:47 am

162wonderY
Dic 4, 2021, 11:47 am

17norabelle414
Dic 4, 2021, 12:04 pm

18anglemark
Dic 4, 2021, 12:04 pm

19faktorovich
Dic 4, 2021, 1:03 pm

20faktorovich
Dic 4, 2021, 1:09 pm

21faktorovich
Dic 4, 2021, 1:11 pm

22faktorovich
Dic 4, 2021, 1:14 pm

23faktorovich
Dic 4, 2021, 1:16 pm

24paradoxosalpha
Modificato: Dic 4, 2021, 1:19 pm

25faktorovich
Dic 4, 2021, 1:40 pm

26faktorovich
Dic 4, 2021, 1:48 pm

27paradoxosalpha
Dic 4, 2021, 1:53 pm

28lilithcat
Dic 4, 2021, 2:02 pm

29faktorovich
Dic 4, 2021, 2:09 pm

30faktorovich
Dic 4, 2021, 2:11 pm

31faktorovich
Dic 4, 2021, 2:13 pm

32faktorovich
Dic 4, 2021, 2:18 pm

33DuncanHill
Modificato: Dic 4, 2021, 2:23 pm

34faktorovich
Dic 4, 2021, 2:21 pm

35Taphophile13
Dic 4, 2021, 2:27 pm

36faktorovich
Dic 4, 2021, 2:28 pm

37MarthaJeanne
Modificato: Dic 4, 2021, 2:45 pm

38faktorovich
Dic 4, 2021, 2:47 pm

39faktorovich
Dic 4, 2021, 2:50 pm

40faktorovich
Dic 4, 2021, 2:53 pm

41faktorovich
Dic 4, 2021, 2:55 pm

42lilithcat
Dic 4, 2021, 2:57 pm

43MarthaJeanne
Modificato: Dic 4, 2021, 3:01 pm

44lilithcat
Dic 4, 2021, 3:00 pm

45faktorovich
Dic 4, 2021, 3:00 pm

46faktorovich
Dic 4, 2021, 3:03 pm

47DuncanHill
Dic 4, 2021, 3:05 pm

48Taphophile13
Dic 4, 2021, 3:06 pm

49faktorovich
Dic 4, 2021, 3:06 pm

50faktorovich
Dic 4, 2021, 3:08 pm

51faktorovich
Dic 4, 2021, 3:13 pm

52MarthaJeanne
Dic 4, 2021, 3:14 pm

53faktorovich
Dic 4, 2021, 3:15 pm

54faktorovich
Dic 4, 2021, 3:17 pm

55andyl
Dic 4, 2021, 3:18 pm

56andyl
Dic 4, 2021, 3:19 pm

57Taphophile13
Modificato: Dic 4, 2021, 3:22 pm

58faktorovich
Dic 4, 2021, 3:23 pm

59aspirit
Modificato: Dic 4, 2021, 3:30 pm

60faktorovich
Dic 4, 2021, 3:24 pm

61Taphophile13
Dic 4, 2021, 3:27 pm

62faktorovich
Dic 4, 2021, 3:29 pm

63faktorovich
Dic 4, 2021, 3:31 pm

64thorold
Modificato: Dic 4, 2021, 3:56 pm

65AbigailAdams26
Modificato: Dic 4, 2021, 4:18 pm

66Keeline
Dic 4, 2021, 4:05 pm

67faktorovich
Dic 4, 2021, 4:08 pm

68faktorovich
Dic 4, 2021, 4:20 pm

69faktorovich
Dic 4, 2021, 4:25 pm

70paradoxosalpha
Modificato: Dic 4, 2021, 5:46 pm

71aspirit
Dic 4, 2021, 6:09 pm

72Petroglyph
Dic 4, 2021, 7:14 pm

73norabelle414
Modificato: Dic 4, 2021, 7:58 pm

74faktorovich
Dic 4, 2021, 9:01 pm

75lilithcat
Dic 4, 2021, 9:40 pm

76paradoxosalpha
Dic 4, 2021, 10:25 pm

77Crypto-Willobie
Dic 5, 2021, 12:20 am

78SandraArdnas
Dic 5, 2021, 4:32 am

79anglemark
Dic 5, 2021, 5:56 am

80andyl
Dic 5, 2021, 6:52 am