• 1 Post
  • 200 Comments
Joined 1 year ago
cake
Cake day: June 9th, 2023

help-circle



  • The data are stored, so it’s not a live-feed problem. It is an inordinate amount of data that’s stored though. I don’t actually understand this well enough to explain it well, so I’m going to quote from a book [1]. Apologies for wall of text.

    “Serial femtosecond crystallography [(SFX)] experiments produce mountains of data that require [Free Electron Laser (FEL)] facilities to provide many petabytes of storage space and large compute clusters for timely processing of user data. The route to reach the summit of the data mountain requires peak finding, indexing, integration, refinement, and phasing.” […]

    "The main reason for [steep increase in data volumes] is simple statistics. Systematic rotation of a single crystal allows all the Bragg peaks, required for structure determination, to be swept through and recorded. Serial collection is a rather inefficient way of measuring all these Bragg peak intensities because each snapshot is from a randomly oriented crystal, and there are no systematic relationships between successive crystal orientations. […]

    Consider a game of picking a card from a deck of all 52 cards until all the cards in the deck have been seen. The rotation method could be considered as analogous to picking a card from the top of the deck, looking at it and then throwing it away before picking the next, i.e., sampling without replacement. In this analogy, the faces of the cards represent crystal orientations or Bragg reflections. Only 52 turns are required to see all the cards in this case. Serial collection is akin to randomly picking a card and then putting the card back in the deck before choosing the next card, i.e., sampling with replacement (Fig. 7.1 bottom). How many cards are needed to be drawn before all 52 have been seen? Intuitively, we can see that there is no guarantee that all cards will ever be observed. However, statistically speaking, the expected number of turns to complete the task, c, is given by: where n is the total number of cards. For large n, c converges to n*log(n). That is, for n = 52, it can reasonably be expected that all 52 cards will be observed only after about 236 turns! The problem is further exacerbated because a fraction of the images obtained in an SFX experiment will be blank because the X-ray pulse did not hit a crystal. This fraction varies depending on the sample preparation and delivery methods (see Chaps. 3–5), but is often higher than 60%. The random orientation of crystals and the random picking of this orientation on every measurement represent the primary reasons why SFX data volumes are inherently larger than rotation series data.

    The second reason why SFX data volumes are so high is the high variability of many experimental parameters. [There is some randomness in the X-ray pulses themselves]. There may also be a wide variability in the crystals: their size, shape, crystalline order, and even their crystal structure. In effect, each frame in an SFX experiment is from a completely separate experiment to the others."

    The Realities of Experimental Data” "The aim of hit finding in SFX is to determine whether the snapshot contains Bragg spots or not. All the later processing stages are based on Bragg spots, and so frames which do not contain any of them are useless, at least as far as crystallographic data processing is concerned. Conceptually, hit finding seems trivial. However, in practice it can be challenging.

    “In an ideal case shown in Fig. 7.5a, the peaks are intense and there is no background noise. In this case, even a simple thresholding algorithm can locate the peaks. Unfortunately, real life is not so simple”

    It’s very cool, I wish I knew more about this. A figure I found for approximate data rate is 5GB/s per instrument. I think that’s for the European XFELS.

    Citation: [1]: Yoon, C.H., White, T.A. (2018). Climbing the Data Mountain: Processing of SFX Data. In: Boutet, S., Fromme, P., Hunter, M. (eds) X-ray Free Electron Lasers. Springer, Cham. https://doi.org/10.1007/978-3-030-00551-1_7



  • He doesn’t directly control anything with C++ — it’s just the data processing. The gist of X-ray Crystallography is that we can shoot some X-rays at a crystallised protein, that will scatter the X-rays due to diffraction, then we can take the diffraction pattern formed and do some mathemagic to figure out the electron density of the crystallised protein and from there, work out the protein’s structure

    C++ helps with the mathemagic part of that, especially because by “high throughput”, I mean that the research facility has a particle accelerator that’s over 1km long, which cost multiple billions because it can shoot super bright X-rays at a rate of up to 27,000 per second. It’s the kind of place that’s used by many research groups, and you have to apply for “beam time”. The sample is piped in front of the beam and the result is thousands of diffraction patterns that need to be matched to particular crystals. That’s where the challenge comes in.

    I am probably explaining this badly because it’s pretty cutting edge stuff that’s adjacent to what I know, but I know some of the software used is called CrystFEL. My understanding is that learning C++ was necessary for extending or modifying existing software tools, and for troubleshooting anomalous results.






  • In a genetic sense, it is a dysfunction of the gene that causes this. It’s neat because we can actually trace the history of human migrations by looking at the distribution of this particular allele (version of a gene). We have analysed DNA from ancient remains of early Europeans and found that the A allele is absent. It appears like this version of the gene first emerged in an ancient East Asian population.

    This gene also determines whether you have dry or sticky ear wax. It’s a neat gene because it’s uncommon for physical human traits to be controlled by one gene — most human characteristics are controlled by multiple genes (polygenic traits); ginger hair is another example of a monogenic trait. ABCC11 is neat because it affects multiple traits: sweat smell and earwax dryness.

    It might also be implicated in breast cancer risk (I can’t tell whether that’s in an increased risk or decreased risk), but we don’t really understand yet how that would work. From skimming the research, I would say we generally don’t understand how this gene works at all. We do know some stuff about it and how/why it works, but we’re still a decent way off of actually understanding its implications.




  • Ask her what her favourite episode is. Once you get small kids talking, it’s actually great, they tell such great stories.

    Share (age appropriate of course) opinions of your own along the way. Like, don’t just say “have you seen [episode with pots and pans]”, expand it by saying stuff like you’ve not seen much Bluey, but you have seen the one with the pots and pans — does she know the one you mean? I suggest this because kids are actually pretty socially adept and I’ve found myself in analogous situations where I caused confusion by mentioning something I barely knew and the kid reasonably interpreted this as “this person wants to talk about this thing”, and then when I didn’t seem to know anything about the topic I had suggested, the kid seemed pretty thrown off and uncertain how to respond.

    Or completely open ended questions, like “I know you like Bluey, but I’ve never seen it before. What’s your favourite episode?”, which could lead into asking for more details on what happened in that particular episode and why she likes it.

    The thing about small talk is that I’ve found there’s a distinction between being good at it, and enjoying it. I used to think I was awful at smalltalk, before I realised that actually, I just didn’t find it enjoyable. I think to some extent, the point isn’t to enjoy it, but to build a conversational back and forth rally which builds initial rapport to figure out what common ground exists between two people (which can lead to more enjoyable proper conversation). Some people do enjoy small talk though. The rally model was useful for me because it underscored how I need to serve the other person options to hit back with.

    For example, most kids go to school, so that’s a decent enough topic for if you’re running out of ideas. With kids, you can get away with clunky conversation starters like “What’s your favourite subject at school?”. Better than that though is something like “My favourite subject at school is science, what’s yours?” because it gives your conversation partner the option of responding either to your statement (such as with “ugh, I hate science, [teacher] is so mean!”), or your question, and having multiple options to hit back with allows for flow to help. Once you hit on a topic the kid is excited to talk about, you’re golden: just keep being interested in their perspective and give bits of your own perspective so they don’t feel like they’re being interrogated.

    Edit: This was a great question, btw OP — It’s led to a lot of interesting discussion, thanks for asking it






  • Honestly, trying not to beat myself up too much is a big part of it. This sounds like a wishy washy answer, but for real, I improved a heckton when I started giving reasonably accurate updates.

    E.g Before: I have arranged to meet with a friend at 4pm. My travel time is 30 but can be variable due to traffic. It takes me 30 mins to get ready. 3:15 rolls round and I still haven’t started getting ready. If I hurried, I might make it in time. I do not do that. I cringe internally and end up indulging more deeply in whatever distraction caused me to overrun. 4pm rolls around and my friend messages to check in. Either I cancel, or I tell a lie and start getting ready, hating myself all the while. My friend is irked at me, and I don’t blame them.

    After: The same as before, but at 3:15, I message my friend to tell them I’ll be half an hour late (I round up to account for being bad at time). I end up being 5 or 10 minutes late nonetheless, but my friend isn’t annoyed, partly because I kept them in the loop about my progress. I still cringe at being late, but I find that over time, I get better at genuinely holding myself accountable, and at estimating time.