Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor lyrics tests, do not search for empty metadata #5452

Open
wants to merge 14 commits into
base: adjust-sphinx-post-update
Choose a base branch
from

Conversation

snejus
Copy link
Member

@snejus snejus commented Oct 3, 2024

Description

Fixes #2635
Fixes #5133

I realised that #5406 has gotten too big, thus I'm splitting it into several smaller PRs.

This PR refactors lyrics plugin tests and fixes an empty metadata issue in the lyrics logic.

CI

  • Added --extras=lyrics to the Poetry install command to include the lyrics plugin dependencies.
  • In the main task which measures coverage, set LYRICS_UPDATED environment variable based on changes detected in the lyrics files.

Test setup

  • Introduced ConfigMixin to centralize configuration setup for tests, reducing redundancy. This can be used by tests based on pytest.

Lyrics logic

  • Trimmed whitespace from item.title, item.artist, and item.artist_sort in search_pairs function.
  • Added checks to avoid searching for lyrics if either the artist or title is missing.
  • Improved _scrape_strip_cruft function to remove Google Ads tags and unnecessary HTML tags.

Lyrics tests overhaul

  • Migrated lyrics tests to use pytest for better isolation and configuration management.
  • Deleted redundant lyrics text files and some unused utils.
  • Marked tests that should only run when lyrics source code is updated (LYRICS_UPDATED is set from the CI) using the on_lyrics_update marker.

Documentation and Dependencies

  • Added requests-mock version 1.12.1 to pyproject.toml and poetry.lock for mocking HTTP requests in tests.
  • Updated setup.cfg to include a new marker on_lyrics_update.

@snejus snejus self-assigned this Oct 3, 2024
@snejus snejus requested a review from bal-e October 3, 2024 13:41
@snejus snejus force-pushed the lyrics-refactor-tests branch 4 times, most recently from 626bf91 to 273e39b Compare October 3, 2024 19:11
.github/workflows/ci.yaml Show resolved Hide resolved
CONTRIBUTING.rst Outdated Show resolved Hide resolved
@snejus snejus force-pushed the lyrics-refactor-tests branch 3 times, most recently from 200cd19 to 9e67a56 Compare October 12, 2024 01:15
@snejus snejus changed the base branch from master to lyrics-fix-tekstowo October 12, 2024 01:39
@snejus snejus force-pushed the lyrics-refactor-tests branch 2 times, most recently from b4d78da to 038ca48 Compare October 12, 2024 01:48
test/plugins/test_lyrics.py Outdated Show resolved Hide resolved
test/plugins/test_lyrics.py Outdated Show resolved Hide resolved
Base automatically changed from lyrics-fix-tekstowo to master October 12, 2024 21:52
@snejus snejus force-pushed the lyrics-refactor-tests branch 2 times, most recently from 2fb6682 to d02e017 Compare October 12, 2024 22:10
@snejus snejus requested a review from bal-e October 12, 2024 22:18
@snejus snejus force-pushed the lyrics-refactor-tests branch 2 times, most recently from 54f06f9 to 19dff1c Compare October 18, 2024 21:14
@snejus
Copy link
Member Author

snejus commented Oct 18, 2024

@bal-e added a pair of additional commits as I noticed that LRCLib integrated tests have been testing no-op logic and failed once I configured it correctly.

I also added lyrics texts that we're expecting to receive from each of the sources we're testing. These will be helpful to have when changes are made in the backends' scraping logic.

@snejus snejus requested a review from jackwilsdon October 18, 2024 21:25
@snejus snejus force-pushed the lyrics-refactor-tests branch from 19dff1c to e70f61b Compare October 19, 2024 00:48
@snejus snejus force-pushed the lyrics-refactor-tests branch from e70f61b to 278279e Compare October 30, 2024 19:23
@snejus snejus force-pushed the lyrics-refactor-tests branch from 278279e to 4970dda Compare November 22, 2024 07:27
@snejus snejus mentioned this pull request Dec 7, 2024
3 tasks
@snejus snejus force-pushed the lyrics-refactor-tests branch from 4970dda to 09863b8 Compare December 7, 2024 16:56
@snejus snejus force-pushed the lyrics-refactor-tests branch 2 times, most recently from a4111b8 to 44b4b46 Compare January 12, 2025 20:36
@snejus snejus force-pushed the lyrics-refactor-tests branch from c5123f9 to a4b45e9 Compare January 12, 2025 22:56
@snejus snejus changed the base branch from master to adjust-sphinx-post-update January 12, 2025 23:02
@snejus snejus force-pushed the lyrics-refactor-tests branch from a4b45e9 to 2b80596 Compare January 12, 2025 23:09
@snejus snejus force-pushed the adjust-sphinx-post-update branch 5 times, most recently from d2eaa89 to 3bb8af8 Compare January 13, 2025 22:48
snejus added 14 commits January 13, 2025 22:54
Two google sources failed to return the expected output. I looked into
each case why parsing failed:

- lyrics on musica.com contain <aside> Google Ads
- each lyrics line on lacoccinelle.net is wrapped within alternating
  <em> and <strong> tags

Thus remove these tags as part of the HTML cleanup logic.
(#5102) Refactor lyrics tests which depended on local developer beets
configuration.
- Replaced unittest.mock with pytest fixtures for better test isolation and readability.
- Simplified test cases by using parameterized tests.
- Added `requests-mock` dependency to `pyproject.toml` and `poetry.lock`.
- Removed redundant helper functions and classes.
The test for GeniusLyrics was heavily patched and no longer provided
useful coverage. It has been removed to clean up the test suite.
- Consolidated multiple test cases into parameterized tests for better
  readability and maintainability.
- Simplified assertions by comparing lists of actual and expected
  artists/titles.
- Added `unexpected_empty_artist` marker to handle cases which
  unexpectedly return an empty artist. This seems to be happen when
  `artist_sort` field is empty.
Modified `search_pairs` function in `lyrics.py` to:

* Firstly strip each of `artist`, `artist_sort` and `title` fields
* Only generate alternatives if both `artist` and `title` are not empty
* Ensure that `artist_sort` is not empty and not equal to artist (ignoring
  case) before appending it to the artists

Extended tests to cover the changes.
Since at least one Backend requires album` and `duration` arguments
(`LRCLib`), the caller (`LyricsPlugin.fetch_item_lyrics`) must always
provide them.

Since they need to provided, we need to enforce this by defining them as
positional arguments.

Why is this important? I found that integrated `LRCLib` tests have been
passing, but they called `LRCLib.fetch` with values for `artist` and
`title` fields only, while the actual functionality *always* provides
values for `album` and `duration` fields too.

When I adjusted the test to provide values for the missing fields,
I found that it failed. This makes sense: Lib `album` and `duration`
filters are strict on LRCLib, so I was not surprised the lyrics could
not be found.

Thus I adjusted `LRCLib` backend implementation to only filter by each
of these fields when their values are truthy.
Add explicit checks for lyrics texts fetched from the tested sources.

- Introduced `LyricsPage` class to represent lyrics pages for integrated
  tests.
- Configured expected lyrics for each of the URLs that are being
  fetched.
- Consolidated integrated tests in a new `TestLyricsSources` class.
- Mocked Google Search API to return the lyrics page under test.
@snejus snejus force-pushed the lyrics-refactor-tests branch from 2b80596 to 02386cc Compare January 13, 2025 22:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants