In January 2024,after decades of intermittent improvements in virtual reality(VR)and augmented reality(AR)devices failing to help the technology catch on with the public,tech behemoth Apple(Cupertino,CA,USA)launched its marketing of a headset with by far the highest resolution and sharpest contrast of any commercially available device to date[1].Yet sales of the Vision Pro(Fig.1),Apple’s much anticipated first entry in the VR/AR marketplace,have fallen far short of the company’s already low expectations[2].
Where does the line between effective confidence and annoying cockiness lie?There areplenty of reasons to be confused about the answer to this question.One study will show that confidence-even completely unearned confidence-pays off big time,and the next warns that outright bragging tends to backfire.
The rapid evolution of Large Language Models(LLMs) highlights the necessity for ethical considerations and data integrity in AI development, particularly emphasizing the role of FAIR(Findable, Accessible, Interoperable, Reusable) data principles. While these principles are crucial for ethical data stewardship, their specific application in the context of LLM training data remains an under-explored area. This research gap is the focus of our study, which begins with an examination of existing literature to underline the importance of FAIR principles in managing data for LLM training. Building upon this, we propose a novel frame-work designed to integrate FAIR principles into the LLM development lifecycle. A contribution of our work is the development of a comprehensive checklist intended to guide researchers and developers in applying FAIR data principles consistently across the model development process. The utility and effectiveness of our frame-work are validated through a case study on creating a FAIR-compliant dataset aimed at detecting and mitigating biases in LLMs. We present this framework to the community as a tool to foster the creation of technologically advanced, ethically grounded, and socially responsible AI models.
Turbulence is strongly associated with the vast majority of fluid flows in nature and industry.Traditionally,results given by the direct numerical simulation(DNS)of Navier-Stokes(NS)equations that relate to a famous millennium problem are widely regarded as‘reliable’benchmark solutions of turbulence,as long as grid spacing is fine enough(i.e.less than the minimum Kolmogorov scale)and time-step is small enough,say,satisfying the Courant-Friedrichs-Lewy condition(Courant number<1).Is this really true?In this paper a two-dimensional sustained turbulent Kolmogorov flow driven by an external body force governed by the NS equations under an initial condition with a spatial symmetry is investigated numerically by the two numerical methods with detailed comparisons:one is the traditional DNS,the other is the‘clean numerical simulation’(CNS).In theory,the exact solution must have a kind of spatial symmetry since its initial condition is spatially symmetric.However,it is found that numerical noises of the DNS are quickly enlarged to the same level as the‘true’physical solution,which finally destroy the spatial symmetry of the flow field.In other words,the DNS results of the turbulent Kolmogorov flow governed by the NS equations are badly polluted mostly.On the contrary,the numerical noise of the CNS is much smaller than the‘true’physical solution of turbulence in a long enough interval of time so that the CNS result is very close to the‘true’physical solution and thus can remain symmetric,which can be used as a benchmark solution for comparison.Besides,it is found that numerical noise as a kind of artificial tiny disturbances can lead to huge deviations at large scale on the two-dimensional Kolmogorov turbulence governed by the NS equations,not only quantitatively(even in statistics)but also qualitatively(such as spatial symmetry of flow).This highly suggests that fine enough spatial grid spacing with small enough time-step alone could not guarantee the validity of the DNS of the NS equations:it is only