Jump to content

Wikipedia talk:Reliable sources

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

AI-written citations?

[edit]

I was adding an event to an article (Special:Diff/1220193358) when I noticed that the article I was reading as a source, and planning to cite, was tagged as being written by AI on the news company's website. I've looked around a bit, skimmed Wikipedia: Using neural network language models on Wikipedia, WP:LLM, WP:AI, WP:RS and this Wikimedia post, but couldn't find anything directly addressing whether it's ok to cite articles written by AI. Closest I could find is here on WP:RS tentatively saying "ML generation in itself does not necessarily disqualify a source that is properly checked by the person using it" and here on WP:LLM, which clearly states "LLMs do not follow Wikipedia's policies on verifiability and reliable sourcing.", but in a slightly different context, so I'm getting mixed signals. I also asked Copilot and GPT3.5, which both said AI-written citations neither explicitly banned nor permitted, with varying levels of vaguery.

For my specific example, I submitted it but put "(AI)" after the name, but I wanted to raise this more broadly because I'm not sure what to do. My proposal is what I did, use them but tag them as AI in the link, but I'm curious to hear other suggestions.

I've put this on the talk pages in Wikipedia:Using neural network language models on Wikipedia and Wikipedia:Reliable sources. SqueakSquawk4 (talk) 11:36, 22 April 2024 (UTC)[reply]

For me it comes down to a case by case basis. If AI is being used as part of the process, but ultimately the article is from a real person and editor then it's probably fine. The issue comes from articles completely written by AI with little or no oversight.
The site has an AI disclaimer[1] where they say they only use AI in the first way, not the latter. So on that point I would think it should be ok. -- LCU ActivelyDisinterested «@» °∆t° 13:02, 22 April 2024 (UTC)[reply]
@SqueakSquawk4, do you absolutely need that source? If you can find a better one, then I suggest using the better one instead. WhatamIdoing (talk) 02:03, 24 April 2024 (UTC)[reply]
A) I kinda do, it's the only citation I found with everything in the same place. If I took it out I'd have to put in 2 or 3 seperate citations to not leave something uncited.
B) I was going trying to ask more generally, with the one I found as just an example rather than really the focus of what I was asking.
C) @ActivelyDisinterested Thanks, didn't spot that. SqueakSquawk4 (talk) 12:32, 25 April 2024 (UTC)[reply]
  • AI = NO Considering the 'hallucination" issue that LLMs have, and, in fact, considering how they are constructed at a base logic level, I would categorically treat any "AI" source as intrinsically non-reliable. If a news agency is found to be using "AI" constructed articles on a regular basis then that source should be deprecated. Simonm223 (talk) 12:42, 25 April 2024 (UTC)[reply]
    Simon, I think black-and-white rules are easy to understand, but hallucination is only an issue when it appears. AI sometimes generates false claims. If it's writing something you know to be true and non-hallucinated (e.g., because you've read the same claim in other sources, or because it's the kind of general, non-controversial knowledge that the Wikipedia:No original research says doesn't require a citation, like "The capital of France is Paris"), then that problem is irrelevant.
    @SqueakSquawk4, editors might accept this source, especially in light of what AD says. However, if the content is important to you, you might consider using the three other sources instead of (or in addition to) this one, to make it harder for someone to remove it on simplistic "all AI is wrong and bad" grounds.
    As a tangent, we've never defined reliable sources. Unlike an article, which would doubtless begin with a sentence like "A reliable source is...", this guideline begins with "Wikipedia articles should be based on reliable, published sources". I suggest that the actual definition, in practice, is "A reliable source is a published source that experienced Wikipedia editors accept as supporting the material it is cited for". Some editors strongly oppose AI-generated sources, and we can usually expect that some editors won't take time to understand the nuances behind using AI as a convenience vs using AI unsupervised to generate content wholesale.[*] Therefore, I'm uncertain whether it would considered reliable if it were ever seriously disputed.
    [*] This is happening in the real world, with a student accused of plagiarism without any evidence except Turnitin thinking it was AI-generated,[2] so it'll happen on wiki, too. WhatamIdoing (talk) 17:10, 25 April 2024 (UTC)[reply]
    I read on some AI-test tool I tried a caveat, something like "don't use this to punish students." Gråbergs Gråa Sång (talk) 18:22, 25 April 2024 (UTC)[reply]
    IF you have double checked the AI generated source, and it a) actually exists, b) is reliable and c) directly supports the information in the article… then it doesn’t really matter how the source was “generated”. The key is that a human has checked it. Blueboar (talk) 12:40, 3 July 2024 (UTC)[reply]
The general standard applied to trusted news organizations is that it is assumed that they have a process to ensure that their articles are sufficiently reliable, regardless of which specific writer wrote the article. We do not say: You can trust NYT articles if they are written by Mary, but not if they are written by Bob. In theory, there is no difference in this regard between articles written by humans or AI. If they do not fact-check articles written by AI, then it is likely that they also don't fact-check articles written by human writers. And it is certainly possible (in theory) that a news organization only publishes AI articles that are thoroughly fact-checked and corrected, although the use of AI is a red flag that suggests that they are cutting corners.
But that is all generic theory. I would argue that Hoodline is not a good source in general, since they don't even have their own Wikipedia page. Also, Wikipedia states this about Nextdoor, the company that owns them:

In 2019, Nextdoor acquired local news site Hoodline. Later that year, HuffPost and Wired reported that Nextdoor paid a firm to improve its reputation by lobbying for changes to the Wikipedia articles on Nextdoor, NBC, and several other corporations

If they do this, then I have no faith in the quality of Hoodline's reporting. This may just be a AI-generated platform to place ads on, with no journalistic standards. This is something that has cropped up in recent years and will probably become a bigger issue as AI improves, becomes cheaper, becomes easier to use, etc. Aapjes (talk) 11:59, 30 August 2024 (UTC)[reply]

Where is the list of consensus of which websites are reliable?

[edit]

Where is the list of consensus of which websites are reliable? Personally I find it to be extremely hard to find. Please make it easier to find. NamelessLameless (talk) 06:01, 5 August 2024 (UTC)[reply]

Are you perhaps asking about WP:Reliable sources/Perennial sources (shortcut WP:RSP) - which lists those sources we have discussed multiple times? Blueboar (talk) 10:27, 5 August 2024 (UTC)[reply]
Yeah, that's what I was trying to find. NamelessLameless (talk) 22:50, 9 August 2024 (UTC)[reply]
Note that we don't have (and can't have) either an exhaustive list of 'reliable' sources, or of 'unreliable' ones. Instead, we have policy describing the types of sources that are likely to be considered reliable, and mechanisms for discussing whether a particular source should be considered reliable for particular content. WP:RSNP consists of a list of repeatedly discussed sources only. Generally speaking, these tend to be edge cases of one sort or another. AndyTheGrump (talk) 10:54, 5 August 2024 (UTC)[reply]
The list itself is at Sources Mcljlm (talk) 14:53, 5 August 2024 (UTC)[reply]
That is only a list of sources that have been discussed regularly at RSN, it isn't close to being a full list of consensus of which sources are reliable. As well as discussions on RSN that don't appear on the list many project maintain lists related to their areas. -- LCU ActivelyDisinterested «@» °∆t° 16:44, 5 August 2024 (UTC)[reply]
How can the various lists be found? Mcljlm (talk) 06:36, 6 August 2024 (UTC)[reply]
Prior discussions on RSN can be found by searching the archives, there's a search block in the RSN header. I don't know of any easy way of finding all the project lists. NPP maintain a quite big list, Wikipedia:New page patrol source guide, but it still won't be a complete list and they have their own reasons for maintaining it. Ultimately the reason there isn't a single list is that editors should be looking to the relevant policy and guideline, and using their own good judgement. The consensus lists are meant to help editors when disagreement exists about verification of article content, so the same discussions don't have to happen repeatedly. -- LCU ActivelyDisinterested «@» °∆t° 10:10, 6 August 2024 (UTC)[reply]
@NamelessLameless and @Mcljlm, I am curious why you expect a list to exist. Did another editor perhaps claim that a source you wanted to use wasn't on an approved list?
There are somewhere around 1,500,000,000 websites. If an editor spent just one minute looking at assessing each of them, it would take 3,000 years of round-the-clock work – 24 hours a day, 365 days a year, for 40 lifetimes – to make such a list. Also, because websites spring up and then get removed, the list would be seriously out of date even after a few years. It is impossible. There is no list, and there never will be any such list. WhatamIdoing (talk) 21:23, 7 August 2024 (UTC)[reply]

Do reliable sources have to be informed of their use in a Wikipedia article?

[edit]

Do reliable sources have to be informed of their use in a Wikipedia article? Howie Marx (talk) 07:01, 21 August 2024 (UTC)[reply]

Why would/should they be? Headbomb {t · c · p · b} 07:04, 21 August 2024 (UTC)[reply]
I didn`t know if we needed their permission or not to include them as a reliable source in an article ....I can`t think of any reason why they would object to this ... surely it`s good for them too? (as long as it`s factual)
Do you think they should be informed Headbomb? Howie Marx (talk) 07:27, 21 August 2024 (UTC)[reply]
No they don't need to be informed, and their objections wouldn't matter if they had any. -- LCU ActivelyDisinterested «@» °∆t° 10:03, 21 August 2024 (UTC)[reply]
ok thx Howie Marx (talk) 03:20, 26 September 2024 (UTC)[reply]

Reliability and Time

[edit]

How much does time factor into the reliability of sources and the accuracy of information? For example, say an article has multiple sources - enough to pass WP:N and WP:V. However, although the sources are regarded as reliable - as in Generally or Marginally - the info from them may be outdated. Perhaps the article was of an older topic that was notable but wasn't created until after a long time has passed. Yet, the only evidence that proves such info is out of date comes from primary and/or potentially unreliable sources. Since the site requires that articles are based on secondary sources, what would an editor do in this situation? Is it better to leave the article intact until a new reliable secondary source is found? Or should the article be updated with current information, even if that info is from dubious sourcing (or even none at all)? I was under the impression that WP:VNT and WP:NTEMP applies, but is it concerning to leave it even if no other sources ever emerge? Thanks, PantheonRadiance (talk) 22:25, 22 August 2024 (UTC)[reply]

To me, the distinction between primary and secondary sources is prescient: a source becomes primary when the context in which it was written has been sufficiently diverged from by "our own", that it no longer suffices to transparently verify claims and communicate information to the reader. That is, if the reader attempted to make deductions while intuitively applying the understanding of the world presently around them onto the text, they would get crucial facts wrong. A primary source requires an additional layer of interpretation and expertise between it and transparent, verifiable claims we can cite. Remsense ‥  22:43, 22 August 2024 (UTC)[reply]
To directly answer your question: I guess we'll have to find out when it happens? It seems like it would depend entirely on what the article is specifically about. Remsense ‥  22:43, 22 August 2024 (UTC)[reply]
Thanks for your reply! I think I get your point, but just to clarify: unless the info is inherently obvious, most of the time the info from primary sources needs analysis and expertise before we can use it on Wikipedia? And such analysis can generally only be done through a reliable secondary source? That's my line of thinking too. I believe that when it comes to the ease of spreading misinformation online, secondary reliable sources should definitely be used to combat that. PantheonRadiance (talk) 23:15, 22 August 2024 (UTC)[reply]
Also for a specific example, my impetus to discuss this was based on Smosh Games. The article became a redirect ten years ago due to a lack of notability. Recently I uncovered multiple sources found since then that proved notability per WP:WEB. However, there was recent contention regarding the info in the article, namely whether much of it was inaccurate because it was outdated - due to the ten years since the AfD. While I believed in sustained notability, another editor claimed its inaccuracy, and continuously added info attempting to update it, without verifying that the info came from secondary sources. I objected to that due to failing WP:V and WP:OR among other MOS guidelines. Needless to say it's a messy debate. PantheonRadiance (talk) 23:20, 22 August 2024 (UTC)[reply]
@PantheonRadiance, inaccuracy gets solved with the [Edit] button, not the delete button. If an old source says "500 members" or "revenue of $2 million", and that's alleged to be inaccurate due to being out of date, then copyedit it to say "500 members in 1965" or "revenue of $2 million in the 2015–2016 fiscal year". WhatamIdoing (talk) 15:53, 30 August 2024 (UTC)[reply]
WhatamIdoing Sorry for the late reply, but thanks for your advice. I did keep in mind MOS:REALTIME and MOS:DATED when I rewrote the page, trying to say "In year X, they did Y" as much as I could. But I guess it still wasn't enough anyways. PantheonRadiance (talk) 07:00, 9 September 2024 (UTC)[reply]

Dubious

[edit]
The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows.
Please discuss at Wikipedia:Village pump (miscellaneous) WhatamIdoing (talk) 15:57, 30 August 2024 (UTC)[reply]

What should be done to overhaul the {{dubious}} template? Literally every time I've seen it on an article, there is zero discussion on the talk page about what may be dubious in the article. I discussed this on the talk page a while back, but the discussion just went around in circles and fizzled out. Should a drive be done to remove drive-by instances of this tag where no discernible discussion exists? Ten Pound Hammer(What did I screw up now?) 19:41, 28 August 2024 (UTC)[reply]

Also asked at WP:VPM and at WT:V… Please don’t ask the same question at three venues. Consolidate the discussions. Blueboar (talk) 20:40, 28 August 2024 (UTC)[reply]
The discussion above is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.

Bold edit on WP:AGEMATTERS

[edit]

In the passage With regard to historical events, older reports (closer to the event, but not too close such that they are prone to the errors of breaking news) are can be less likely to have errors introduced by repeated copying and summarizing. I have changed the underlined portion.

The main reason for this is ancient primary sources (Plutarch, Livy, Sallust, Cicero, Polybius, Thucydides, etc). They are in fact older and closer to historical events. They are not also necessarily more reliable. The transmission chains for these sources are complicated both in terms of how they were written (see eg Quellenforschung) and how they were copied to the present (eg emendation). For counterexample, it is now relatively common to question descriptions given in, say, Livy on the basis of alternate versions in Dio, even though Dio is later than Livy; similar issues pop up in emendation, where the "earliest" version of a manuscript is not necessarily the one which is accepted. A E Housman in a rather old review, and very fun to read in a base way, a few times aimed his (extremely sharp) skills of invective at that exact assumption.

I noticed this while doing some edits to an essay of mine (User:Ifly6/Primary sources in classics) which also explains why ancient primary sources are problematic. Anyway, I thought the statement rather broad and weakened it. Ifly6 (talk) 17:36, 29 August 2024 (UTC)[reply]

@Ifly6, what do you think about deleting the whole paragraph?
[Text: "With regard to historical events, older reports (closer to the event, but not too close such that they are prone to the errors of breaking news) can be less likely to have errors introduced by repeated copying and summarizing. However, newer secondary and tertiary sources may have done a better job of collecting more reports from primary sources and resolving conflicts, applying modern knowledge to correctly explain things that older sources could not have, or remaining free of bias that might affect sources written while any conflicts described were still active or strongly felt."]
Even if I think it's 100% true, I'm not sure that tells editors anything actionable. It's kind of a long-winded way of saying "Better sources are better". WhatamIdoing (talk) 01:05, 31 August 2024 (UTC)[reply]
I don't have any objection to deleting the paragraph. Ifly6 (talk) 23:54, 2 September 2024 (UTC)[reply]

Newspapers

[edit]

I was recently reading the Wikiedu subject-specific guide on reliable sourcing, in preparation for training students in a history class on how to complete their Wikipedia assignment. The subject-specific guideline says to avoid using "most newspaper articles from the period you're writing about" (page 2) as sources, because they are considered a primary source. I was surprised to read that, but on reflection, it makes sense. I was wondering if that would merit mentioning on this page. How old does a newspaper article have to be to count as a primary source rather than a secondary source? Rachel Helps (BYU) (talk) 17:53, 5 September 2024 (UTC)[reply]

See #Bold edit on WP:AGEMATTERS (above). —Bagumba (talk) 20:41, 5 September 2024 (UTC)[reply]
@Rachel Helps (BYU), you might find WP:PRIMARYNEWS useful. WhatamIdoing (talk) 03:59, 6 September 2024 (UTC)[reply]
Assessment of "primary" and "secondary" is contextual. For intro undergrad students, it's probably best to just say that newspapers are entirely primary, or if not now, they will be within a few months. The more nuanced approach is that there are different types of newspaper articles, and different types of content in a newspaper article, and the context in which one uses such content also determines its suitability for whatever parameters of sourcing you have.
I'll use examples from the freely accessible BBC today: in "Telegram CEO Durov says his arrest 'misguided'", the line "the BBC learned last week that Telegram has refused to join international programmes ..." would be primary source material, as it is the BBC's original reporting from that series of articles. The quotations in the article are taken from Durov's Telegram post, which considering that the BBC and others would verify that account and statement, makes this a secondary source for that statement around this time of publication. However, after some months, it may no longer as important that Durov said something at this specific day, but rather what he said, in which case a line that cites this article for that information would use it as a primary source. Finally, the factual statements about Telegram and Durov's bio at the end will probably be considered citable as a secondary source for many years (although since the source of that information is not given, it is the worst kind of secondary source). And hopefully I've conveyed that the context in which it is cited matters. SamuelRiv (talk) 04:25, 6 September 2024 (UTC)[reply]
Thank you for the explanations! Rachel Helps (BYU) (talk) 18:05, 6 September 2024 (UTC)[reply]
I don't agree with the analysis by @SamuelRiv. The bit about "it was copied, which makes this a secondary source" is WP:LINKSINACHAIN. What makes (some or all of) a source be secondary is the addition of original thinking to prior publications. If the BBC not only repeated the original quotation but added, e.g., something about the implications of this quotation, the analysis of what the quotation reveals about how the arrest is affecting him, how the BBC believes this compares to a similar case, etc., then it would be a secondary source. Merely repeating what someone else said is still primary. WhatamIdoing (talk) 00:38, 7 September 2024 (UTC)[reply]
Yes, independent thought is necessary, but independent content is not necessary to define a secondary source. (And again, one shouldn't try to find some rigid definitions outside of the usage context.) Before a reputable newspaper reprints, in an article, what is claimed to be an statement from someone else, it verifies the statement. That makes it a secondary source for that statement. If they republished it indiscriminately, or at face value without any other reporting on the topic, then it would not be secondary. SamuelRiv (talk) 00:54, 7 September 2024 (UTC)[reply]
I have read that newspapers no longer attempt to verify most statements. Fact checking is expensive. In particular, if the story says "The Daily Newspaper has reported that...", then I think we can assume that the original is being taken at face value. This is not bad; professional journalism is a small world, and its members are better suited for identifying who's trustworthy and who's not than Wikipedia editors are.
But all of this is a tangent: Fact checking produces a reliable source, not a secondary one. This would be a secondary source:
"News reports are conflicting. The Daily Newspaper quotes the mayor as saying 'Aliens are invading!', and The Weekly Standard quotes the police chief as saying 'It's all a big hoax'."
This example is secondary because they are comparing different reports. That would be secondary even if it were on a self-published blog, written by a non-expert, with no reputation for fact-checking or any of the other qualities we value. You couldn't use such a source because it would be totally unreliable, but Wikipedia:Secondary does not mean good, and it would be secondary. WhatamIdoing (talk) 05:47, 7 September 2024 (UTC)[reply]
I didn't use the term 'fact-checking' (as in the editorial procedure for each story). I say verifying as in independently verifying the reporting of a news outlet before re-reporting their content. That may mean some combination of fact-checking, interviewing the original journalist and editor, cross-checking with one's own sources, or some other process. The point is that we believe that a major news outlet generally does this (and they may outline their procedures publicly). SamuelRiv (talk) 12:39, 9 September 2024 (UTC)[reply]
What makes you think this actually happens? WhatamIdoing (talk) 21:14, 9 September 2024 (UTC)[reply]
It is the definition of an WP:RS; if you believe an individual source is failing to do that, you should raise an objection on talk or take it to WP:RSN if the problem is systematic. If you believe all sources are failing to do that then there's nothing to be done; Wikipedia, by its nature, assumes that RSes exist. (Keep in mind that of course even the best RS will sometimes have some failures - that's very different from asserting that they have no fact-checking process at all!) --Aquillion (talk) 15:56, 13 October 2024 (UTC)[reply]
There's also the theory of accountability. Even if an outlet does not actively check every fact on every article, or even the majority of facts, if its reputation and financial stability hinge on its resistance to the scandal of major inaccuracy (if indeed major inaccuracy would be scandalous to that outlet, which is pretty much a requirement for RS here), then one can have some degree of confidence in the passive processes that would also serve rigor and accuracy in an institution.
Because resources are finite in every institution (even in some golden age of X), one always has to in practice give some faith to passive forces. SamuelRiv (talk) 16:40, 13 October 2024 (UTC)[reply]

"Appropriate"

[edit]

"Each source must be carefully weighed to judge whether it is reliable for the statement being made in the Wikipedia article and is an appropriate source for that content."

What is intended by distinguishing appropriateness from being reliable for a statement being made? Is it referring to WP:UNDUE, or is it elaborating evaluating whether a source is reliable for a statement being made?

Rollinginhisgrave (talk) 11:24, 10 October 2024 (UTC)[reply]

This statement is about whether the source is appropriate for the content (e.g., reliable), not about whether the content is appropriate for the Wikipedia article (e.g., UNDUE). WhatamIdoing (talk) 21:44, 11 October 2024 (UTC)[reply]
We've never really needed a definition of Wikipedia:Appropriate. Until last year or so, I don't remember anyone even asking about it. I recently expanded Wikipedia:Identifying and using primary sources#"Secondary" does not mean "good" to provide a bit of an explanation.
I think it's easier to understand that some sources are obviously inappropriate than to define appropriate directly. WhatamIdoing (talk) 21:47, 11 October 2024 (UTC)[reply]
If it is referring to being reliable for the statement, how is it not a tautology? I'm just not sure what it's adding. Rollinginhisgrave (talk) 22:45, 11 October 2024 (UTC)[reply]
"E.g." means "For example". It does not mean "A complete list of all factors". An unreliable source would obviously be inappropriate, so I give it as an example of one way in which a source could be inappropriate. If you would like others, then see the examples in WP:NOTGOODSOURCE that I linked above. WhatamIdoing (talk) 20:13, 12 October 2024 (UTC)[reply]
I understand, this is just not communicated in the quoted text at the top. If a source being reliable for a statement was entailed in "appropriate" as you give in your e.g. example, then it would be sufficient to just say the source is appropriate and omit the first half of the sentence. I think rewording the last part "and is [otherwise] an appropriate source for that content" would clear this up, especially with the link to WP:NOTGOODSOURCE embedded. Rollinginhisgrave (talk) 07:15, 13 October 2024 (UTC)[reply]
  • I think that "appropriate", in this context, is a catchall for the sorts of things detailed in the reliability in specific contexts section and our more specific contextual content policies like WP:BLP, WP:EXCEPTIONAL, WP:MEDRS, WP:FRINGE and so on. The one thing that isn't spelled out, which perhaps should be, is that sources with relevant expertise are generally preferred over ones that lack it, but that isn't necessarily a requirement outside of WP:MEDRS, and other factors can complicate things - a news article written by someone with no expertise can still summarize the opinions of experts in ways that we can cover in the article voice, after all; whereas an academic paper may represent a single study whose results are outside of the mainstream. --Aquillion (talk) 15:40, 13 October 2024 (UTC)[reply]
    An example differentiating “reliable” from “appropriate” … Suppose Joe TikTok writes in his social media: “Today was my 21st birthday, and I was enjoying it until a bunch of Anti-(cause) protestors stormed the restaurant and…” followed by a long rant on how he now hates Anti-(cause) protestors.
    That social media post can be considered ABOUTSELF reliable for verifying his birthday, but I would argue that the rest of the content makes it inappropriate to use for that purpose. Blueboar (talk) 17:52, 13 October 2024 (UTC)[reply]
I'll be honest Blueboar, I've never encountered this before, it's very interesting to hear. To my eyes, it's not at all a natural reading from "appropriate". And I probably would have considered it appropriate. Even if it's not written verbatim in policy it's good to know.
Aquillion, I see WP:BLP, WP:EXCEPTIONAL etc as all entailed in the first half of the sentence. For instance, would you say a source failing BLP is still reliable for the statement being made in the Wikipedia article? I can see how it could apply to generally preferring expertise, albeit not to MEDRS, per the same justification as my prior BLP example. I am lightly bludgeoning this and it's not gaining traction, so I might leave it there, thanks for weighing in. Rollinginhisgrave (talk) 10:10, 14 October 2024 (UTC)[reply]