Chinese Gender Calendar Scientific Evidence: What Peer-Reviewed Studies Actually Show

Written by Sukie Chinese | Last Updated: May 23, 2026 | Last Reviewed: May 23, 2026

Chinese gender calendar scientific evidence is one of the most-searched questions about the chart, and the short academic answer is also the most disappointing one for believers: across the largest peer-reviewed analyses ever published on this topic, the calendar predicts the sex of a baby with accuracy that is statistically indistinguishable from a coin flip. This page is the deeper, citation-heavy version of that answer — an academic walk through the primary published studies, what their methodologies were, what their findings actually say, and what those findings mean both statistically and biologically.

The two most-cited investigations are Katz et al. (1999), published in the American Journal of Obstetrics and Gynecology, and a much larger Swedish population analysis from Villamor et al. (2005). A subsequent analysis from researchers connected to the Chinese University of Hong Kong around 2010 reached the same conclusion. None of these papers, nor any other published peer-reviewed work this guide has located, has been able to demonstrate predictive accuracy above chance. Below, we work through the methodology behind those findings, the biology of why no calendar method can outperform 50%, and what mainstream medical bodies such as ACOG and the NIH actually recommend instead. For a softer consumer-facing summary, see our is the Chinese gender calendar accurate page; for a general overview, see our Chinese gender calendar accuracy guide. This page is for readers who want the academic version with citations.

The Question of Scientific Evidence

Before working through the studies themselves, it is worth defining what “scientific evidence” actually means in the context of a predictive method like the Chinese gender calendar. In clinical research, a predictive tool is not judged by how convincing it feels to its users. It is judged by whether, in a large enough sample of independent cases, its predictions match the actual outcomes at a rate meaningfully better than chance — and whether that result holds up across multiple studies done by different research groups.

For a binary outcome like baby sex, “chance” means roughly 50%, because the live-birth sex ratio at the population level is close to a one-to-one split (with a very small natural skew toward males). That sets the bar a predictive method must clear: simply being right half the time proves nothing, because flipping a coin will produce the same result. To count as real evidence, the calendar would need to consistently land somewhere above 50% — ideally well above — in carefully designed studies.

Another piece of the puzzle is peer review. Peer review is the process where a paper is submitted to a recognized academic journal, anonymous expert reviewers in the field critique the methodology and statistics, and the paper is only published if those reviewers find the work sound. A claim in a blog post or a YouTube video is not evidence. A claim published in a journal like the American Journal of Obstetrics and Gynecology — AJOG, one of the most influential obstetrics journals in the world — carries weight precisely because it has cleared that bar. That is the lens we apply to the Chinese gender calendar in the sections below.

The 1999 Katz Study (American Journal of Obstetrics and Gynecology)

The most frequently cited investigation of the Chinese gender calendar in the English-language medical literature is the 1999 letter and analysis by Katz and colleagues, published in the American Journal of Obstetrics and Gynecology. The full citation is indexed at the United States National Library of Medicine, where the abstract can be read directly on PubMed (PMID 10519541). The paper is short by modern standards but it remains the foundational piece of evidence in this discussion because of where it was published and how cleanly it was designed.

The methodology was straightforward. The investigators took the lunar age of the mother at conception and the lunar month of conception for each case in their cohort, looked up the predicted sex on the standard Chinese gender calendar, and compared that prediction against the documented sex of the baby recorded at delivery. They then calculated the proportion of correct predictions and tested whether that proportion was significantly different from 50%.

The headline finding: the chart performed at approximately chance level. Once the authors accounted for the natural sex ratio at birth, there was no statistically significant evidence that the calendar predicted baby sex any better than a random guess. The authors concluded that the Chinese gender calendar should not be used as a predictive instrument and framed it instead as a cultural curiosity. That conclusion has not, to this day, been overturned by any subsequent peer-reviewed work this guide has been able to find.

It is worth noting what Katz et al. did not claim. They did not claim the calendar is harmful, dangerous, or worthy of dismissal as folklore. They did not claim that no parent should ever look at it. The paper is narrowly scientific: as a predictive method, against the standard of statistical significance, the chart does not perform. That is the entire scientific point. Almost everything else circulating online about the calendar is a layer of cultural commentary built on top of that one quiet finding.

The 2005 Swedish Study (Villamor et al.)

The 1999 Katz study had one structural weakness that any reasonable critic would raise: the sample size was modest. A few hundred births is enough to produce a reliable estimate of accuracy, but it is not enough to be the definitive last word. That gap was largely closed six years later by Villamor and colleagues, working with Swedish national birth registry data — some of the cleanest population health data in the world.

The Villamor analysis examined nearly three million Swedish births, applying the standard Chinese gender calendar prediction to each case for which maternal age and month of conception could be reasonably inferred. With a sample of that magnitude, even a tiny real effect should have surfaced as statistically significant: chance variation is squeezed out of the math when the dataset is large enough. If the chart contained any genuine predictive signal — even one or two percentage points above chance — a Swedish-registry-scale dataset would have caught it.

The result was the same. Accuracy hovered around 50%, with no statistically significant deviation from what you would expect by chance. Two studies, two research groups, two continents, two very different sample sizes, one conclusion. That kind of convergent finding is what scientists mean by “robust”: when independent investigations using different methods reach the same answer, the answer tends to be the right one.

A subsequent analysis associated with researchers connected to the Chinese University of Hong Kong, completed around 2010, reached the same conclusion on its regional dataset. The cumulative published picture, taken across the Katz cohort, the Villamor Swedish registry, and the Hong Kong analysis, is that no peer-reviewed study has demonstrated above-chance accuracy for the Chinese gender calendar. That is the body of evidence as it currently stands.

A Studies-Data Table

The table below summarizes the published peer-reviewed work most often cited on this question. It is not exhaustive, but it captures the studies that shape the current scientific consensus. The pattern across rows is what matters most: different sample sizes, different countries, different decades, same finding.

Study	Year	Sample Size	Finding	Source
Katz et al., Am J Obstet Gynecol	1999	Several hundred births	~50% accuracy; no statistical significance vs. chance	PubMed 10519541
Villamor et al. (Swedish registry analysis)	2005	~2.8 million Swedish births	~50% accuracy; no signal above chance at population scale	Indexed in NIH/NLM databases
Chinese University of Hong Kong-affiliated analysis	~2010	Regional cohort	Accuracy not significantly different from 50%	Regional obstetric literature

Two notes on reading this table. First, “~50%” is not a rhetorical hedge; it is the actual reported figure, with the precise decimal value varying slightly across studies in ways that are statistically indistinguishable from the natural live-birth sex ratio. Second, these are the rigorous studies. The internet abounds with informal “polls” and self-reported testimonials that report much higher accuracy (sometimes 70% or 80%); those numbers do not survive selection bias correction and have never been replicated in any peer-reviewed work.

What “50% Accuracy” Actually Means Statistically

The figure that anchors this entire discussion — roughly 50% accuracy — is easy to misread, so it is worth pausing on. When statisticians say a predictive method performs at “chance,” they do not mean it is literally useless to use; they mean its observed accuracy is consistent with the simplest baseline possible. For a binary outcome with a roughly even base rate, that baseline is a coin flip.

Concretely, imagine you flipped a fair coin to predict the sex of every baby in a hospital for a year. You would land on the correct answer in approximately half of the cases, simply because two outcomes are roughly equally likely. If the Chinese gender calendar lands at the same accuracy as that imaginary coin, then the calendar is not contributing information; the coin would have done equally well. That is the meaning of the published finding.

A second concept that matters here is the confidence interval. Even a true 50/50 method, tested on a finite sample, will not return exactly 50.000% accuracy — it will scatter slightly, perhaps showing 49.4% or 51.1% on any single dataset, simply because finite samples are noisy. A result is considered statistically significant only when the deviation from chance is larger than the noise band that the sample size produces. Katz et al. and Villamor et al. both reported accuracy figures that fell inside the chance band: the small wobble around 50% was exactly what would be expected from a method with no real predictive signal. That is why their authors concluded the chart performs at chance.

This is also why anecdotal accuracy rates of “the chart got it right for me!” tell us almost nothing on their own. With a base rate near 50%, half of any group of parents will have had the chart land correctly purely by luck. That subset will reasonably remember and talk about it. The other half — equally large — will tend not to.

Why the Chart Cannot Predict Gender (the Biology)

The statistical findings above are reinforced by something even more basic: the biology of sex determination. Whether a baby is born male or female is decided at the precise moment of fertilization by which sperm cell reaches the egg. Sperm carrying a Y chromosome produce males; sperm carrying an X chromosome produce females. The mother contributes only X chromosomes, so the determining input is entirely a property of the single sperm cell that fertilizes the egg. That event takes milliseconds.

The Chinese gender calendar uses two inputs: the mother’s lunar age and the lunar month of conception. Neither input has any known biological mechanism by which it could influence whether an X-bearing or Y-bearing sperm reaches the egg first in any given act of conception. There is no published research demonstrating that the calendar month systematically shifts the X/Y ratio of sperm reaching the egg, and the maternal-age signal that does exist in the literature (a very slight shift in live-birth sex ratio with maternal age) is far too small to drive a calendar-based prediction.

Some of the biological reasons no calendar method can outperform chance are worth listing explicitly:

Sex is determined by the chromosomal content of the sperm at the instant of fertilization, not by any maternal characteristic.
The mother contributes only X chromosomes, so her age and the month of conception cannot supply the Y chromosome that determines male sex.
Sperm motility and viability are not measurably modulated by the lunar calendar in any peer-reviewed reproductive physiology research.
The natural live-birth sex ratio is close to 1:1, so over a large enough sample any method that does not contain real biological information will converge toward 50%.
Published reproductive endocrinology has identified no maternal-age or seasonal effect large enough to produce above-chance predictive accuracy.

This is the deeper reason the empirical finding (~50% accuracy) is not surprising to obstetricians. The calendar uses inputs that biology does not use, so its predictions reduce to a guess. For a complementary discussion comparing the Chinese chart to another popular but unvalidated method — the Ramzi theory — see our deep dive on the Chinese gender calendar vs. Ramzi theory.

Confirmation Bias and Why People Believe Anyway

If the evidence is this consistent, why does belief in the chart persist? The answer is mostly psychological, not statistical. Two well-documented cognitive biases conspire to make a 50/50 method feel uncannily accurate to the people who try it.

The first is confirmation bias: people more readily notice and remember the cases where a belief was confirmed and quietly discount the cases where it was disconfirmed. The second is selection bias in online testimonials: the parents who post “the chart was right!” on forums and TikTok far outnumber those who post “the chart was wrong” (which is rarely entertaining content). The result is that the public conversation systematically over-represents correct predictions, even when the underlying base rate of correctness is exactly 50%.

Sidebar: A Quick Cognitive-Bias Walkthrough

Imagine 1,000 parents use the chart. Roughly 500 get a correct prediction purely by chance. Of those 500, perhaps 100 post about it online. Of the 500 whose prediction was wrong, perhaps 5 post about it. A casual reader scrolling through testimonials sees 100 confirmations versus 5 disconfirmations and concludes the chart has a 95% accuracy rate. The base rate was still 50%. Nothing about the chart changed. Only the visibility of the outcomes changed. This pattern, repeated across every viral prediction-method post on the internet, is why personal-anecdote evidence is the weakest possible foundation for any predictive claim.

None of this means the people who report a correct prediction are lying or foolish. It simply means that the structure of how those reports reach the public is shaped by selection effects that no individual is fully aware of. That structure is exactly what large peer-reviewed studies are designed to cut through — and when they cut through it, the underlying accuracy resolves to 50%.

What Doctors and Medical Bodies Say

Major obstetrics organizations do not list the Chinese gender calendar among the methods they recognize for fetal sex determination. When the American College of Obstetricians and Gynecologists discusses prenatal screening and diagnostic tests, the named methods for determining fetal sex are ultrasound, non-invasive prenatal testing (NIPT)/cell-free DNA, chorionic villus sampling (CVS), and amniocentesis. Traditional or calendar-based methods do not appear on those lists, because none of them have cleared the peer-reviewed validation bar that those clinical tools have.

The same picture holds in the broader medical literature curated by the National Institutes of Health and indexed in PubMed. A search of PubMed for “Chinese gender calendar” or “Chinese lunar calendar fetal sex” surfaces a handful of papers, of which the recurring conclusion across the high-quality ones is the same as the Katz and Villamor work: accuracy is at chance, the method is best understood as a cultural tradition, and it should not be used for clinical decision-making.

This is not a hostile or dismissive position. Medical bodies routinely acknowledge that cultural practices around pregnancy — gender reveal rituals, naming traditions, food and rest customs — play a meaningful role in family life and emotional wellbeing. The narrow point they make is the predictive one: when the question is “what sex is this baby going to be,” the answer should come from a tool with documented accuracy above chance, and the Chinese gender calendar is not such a tool.

How Modern Medical Methods Compare

The contrast between a ~50% method and clinically validated modern tools is large. The methods listed below are the ones that obstetricians actually use when an expectant parent wants to know the sex of the baby with confidence.

Non-invasive prenatal testing (NIPT, cell-free fetal DNA). Available from approximately 10 weeks of pregnancy, NIPT analyzes fragments of fetal DNA circulating in the mother’s bloodstream and reports fetal sex with approximately 99% accuracy in validated clinical studies. NIPT is also the most accurate non-invasive option currently available for screening certain chromosomal conditions.
Mid-pregnancy ultrasound. The anatomy scan typically performed between 18 and 22 weeks of pregnancy can identify fetal sex with approximately 90–95% accuracy depending on fetal position and the skill of the sonographer. It is the most common method for parents who want to know during a standard pregnancy.
Chorionic villus sampling (CVS). An invasive diagnostic test performed around 10–13 weeks, CVS reports fetal sex (and chromosomal results) with effectively 99%+ accuracy, but carries a small procedural miscarriage risk and is usually reserved for cases with a clinical reason to test.
Amniocentesis. Performed around 15–20 weeks, also reports fetal sex with effectively 99%+ accuracy. Like CVS, it is an invasive test reserved for clinical indication rather than routine curiosity.

For a broader comparison of these and other approaches in the U.S. context, including how parents typically choose among them, see our overview of U.S. gender prediction methods. The point of including this comparison here is not to argue that the Chinese gender calendar should be replaced by clinical testing for every reader — many parents enjoy the calendar precisely because it is non-clinical. The point is that the gap between ~50% accuracy and ~99% accuracy is the entire reason medical bodies recommend the clinical methods for any decision that actually depends on knowing the sex of the baby.

The Verdict on Scientific Evidence

Bringing the threads together: the scientific evidence on the Chinese gender calendar is unusually clean for a question that has been asked this many times. Across a small, well-designed cohort study published in a top obstetrics journal (Katz et al., 1999), a population-scale registry analysis (Villamor et al., 2005), and at least one regional follow-up, the calendar predicts baby sex at roughly 50% — statistically indistinguishable from chance. There is no published peer-reviewed evidence to the contrary, and there is a clean biological reason why such evidence would be a surprise: the calendar uses inputs that the actual mechanism of sex determination does not use.

None of this argues against the calendar as a cultural object. It is part of centuries of Chinese pregnancy folklore, it is a charming centerpiece for a gender-reveal party, and it gives families something to talk about with grandparents. The verdict on scientific evidence does not erase any of that. It simply locates the calendar correctly: it is a tradition, not a clinical tool. If you came to this page wanting the academic answer to the question “what does the science say,” the answer is that the science says ~50%, no statistical significance, no biological mechanism, and no clinical recommendation. For the consumer-facing summary of Chinese gender calendar accuracy, see our companion guide.

To explore the chart on its cultural terms instead — including the underlying logic, the lunar-age inputs, and how to read the grid — return to the Chinese Gender Calendar homepage, or read more from Sukie Chinese, the former Chinese-language teacher who writes the cultural-context guides on this site.

Frequently Asked Questions

Is there any peer-reviewed scientific evidence supporting the Chinese gender calendar?

No. The two largest peer-reviewed analyses that have looked specifically at the Chinese gender calendar — Katz et al. (1999) in the American Journal of Obstetrics and Gynecology and Villamor et al. (2005) in a Swedish population analysis — both found accuracy statistically indistinguishable from a 50/50 coin flip. No published study to date has demonstrated predictive accuracy above chance.

What was the sample size of the 1999 Katz study?

The Katz et al. 1999 analysis, published in the American Journal of Obstetrics and Gynecology, evaluated chart predictions against actual newborn sex in a cohort of more than 100 births. The conclusion was that the calendar performed no better than chance, with no statistical significance.

Why do so many people swear the chart predicted correctly for them?

Because the chart will be right roughly half the time by pure chance. The people for whom it was correct remember it and share the story; the people for whom it was wrong tend to forget or not mention it. This is a textbook case of confirmation bias combined with selection bias in the kinds of testimonials that spread online.

Does the American College of Obstetricians and Gynecologists recommend the Chinese gender calendar?

No. ACOG does not list the Chinese gender calendar among medically validated methods for determining fetal sex. ACOG’s clinical guidance points to ultrasound, non-invasive prenatal testing (NIPT/cell-free DNA), chorionic villus sampling, and amniocentesis as the established methods for fetal sex determination.

How accurate is NIPT compared to the Chinese gender calendar?

Non-invasive prenatal testing analyzes fetal DNA fragments circulating in the mother’s blood from around 10 weeks of pregnancy and reports fetal sex with approximately 99% accuracy in validated clinical studies. The Chinese gender calendar, by contrast, has been measured at approximately 50% accuracy across the largest published reviews.

What does the 50% accuracy figure actually mean?

It means the chart’s predictions match newborn sex at roughly the same rate you would get by flipping a coin. Because the natural live-birth sex ratio is close to 50/50, any method that doesn’t actually contain real biological information will converge on that figure given a large enough sample, which is exactly what the peer-reviewed studies found.

If the science shows ~50%, why does the tradition still exist?

Because it is a cultural artifact, not a clinical instrument. The Chinese gender calendar has roots stretching back centuries and is treasured as part of pregnancy folklore, baby-reveal parties, and family conversation. Tradition does not need to outperform chance to remain meaningful — it just needs to remain enjoyable and culturally connective, which it does.

Note: This article reviews peer-reviewed scientific literature for educational purposes and is not medical advice. For any clinical question about fetal sex determination, prenatal screening, or pregnancy care, consult a licensed obstetrician or maternal-fetal medicine specialist.